Sunday, February 7, 2016

2081065 - Troubleshooting SAP HANA Network

Symptom
You experience performance issues in your SAP HANA landscape and you suspect that network is the reason for this situation.
You also see high DB access time and the threads show state as Network Read meaning Network I/O read happening between SAP HANA nodes.


Environment
SAP HANA Database SPS 08 and higher.
Instructions in this Knowledge Base Article have been created on a SAP HANA Database Revision 80 (Support Package Stack SPS08) and may differ in other revisions.


Cause
You experience performance degradation and would like to investigate whether network might be the reason for this degradation.
Below are some common factors that affect network performance and all of which degrade throughput
  • Latency - Latency is a synonym for delay. The time it takes for a packet to cross a network connection from sender to receiver.
  • Bandwidth - The amount of data that can be carried from one point to another in a given time period (usually a second). Network bandwidth is usually expressed in bits per second (bps).
  • Packet loss - Packet loss is the failure of one or more transmitted packets to arrive at their destination. If packet losses are happening on the network interfaces and this would lead to a situation where sender needs to retransmit the packet that ends up in bad performance and delays seen as end user.
There could be communication difficulties between two hosts, e.g.:
  • Between SAP HANA hosts
  • Between SAP HANA database and client application
  • Between primary and replicated SAP HANA database


Resolution

Procedure                                                                    

  1. Analyze Round Trip Time (RTT) between Server and Client
  2. Analyze internode communication
  3. Further analysis recommendations

PATH

  • SAP HANA Studio - SAP HANA Administration Console - SQL Console
  • SAP HANA Studio - SAP HANA Administration Console - Configuration
  • HDBAdmin Tool - Hosts - Network
  • Operating System

How-To

1.       Analyze Round Trip Time (RTT) between Server and Client

In order to analyze the Round Trip Time (RTT) between the Server and Client please use the SQL: "HANA_Network_Clients" from SAP Note 1969700.
Make sure you enable the parameter before executing the SQL, data is only collected if the following parameter is set:
indexserver.ini --> [sql_client_network_io] --> enabled = true
The SQL provides information on the client, server, Sum of RTT, Avg of RTT, Total size in KB, Avg size in KB
As per SAP Note 1100926 these are the KPI’s for the RTT
Rating
KPI
Good value Roundtrip time <= 0.3 ms
Moderate value 0.3 ms < roundtrip time <= 0.7 ms
Below average value Roundtrip time > 0.7 ms
If the average RTT is higher than the KPI’s mentioned please investigate whether this is only with respect to one client/ application server or RTT is poor for all the application servers.
In case it is only with regards to one client/application server then look at that client/application server.
In case of high RTT between the server and all application servers/clients then investigate in detail the network traffic between the server and application servers.
For a detailed analysis capture tcpdump to analyze the network traffic between client and HANA DB (all nodes) to determine if any packet losses occur.


2.       Analyze internode communication

You see that DB execution time is (sporadically) varying for the same query to a great extent and suspect network might be the issue. Therefore proceed with the analysis as mentioned below.
In order to analyze internode communication within SAP HANA landscape please use the SQL: "HANA_Network_Services" from SAP Note 1969700.
You can look at the size of data transferred between the nodes and the throughput.



You can also find information about throughput by using HDBAdmin tool.
Start HDBAdmin Tool - Host - Network
Perform a network test by sending packets of sizes 1MB/10MB/100MB from each of the node to itself as well as to other nodes, on completion of the test it will display the throughput achieved.
You can use HDBAdmin tool to determine the throughput between the hosts, the drawback being fixed packet sizes of 1/10/100 MB.



You can find the KPI’s for data throughput and latency for production SAP HANA Systems in SAP HANA Administration Guide (Section: Data Throughput and Latency KPIs)
The test to itself will always be faster compared to other nodes, hence you will see higher throughput to itself. If the color becomes brighter it means the value is good and if the color becomes transparent then it indicates a problem.
In the above example you can see the Min/Avg/Max values 77.8/105.5/171.9 MB/sec for sending a packet size of 100 MB this is lower than KPI, hence requiring deeper investigation.
An alternative option is to use NIPING to measure network metrics, you can use SAP’s NIPING program to analyze the network connections between any two machines running SAP software. For further information on NIPING see SAP Note 500235.
Hardware vendors may perform a similar performance test to understand the bandwidth and throughput by using a tool known as Iperf. The test is known as Iperf test. However, this tool needs to be installed on each host.

In order to analyze intranode communication within SAP HANA landscape please use the SQL: "HANA_Network_Services" from SAP Note 1969700.
Make sure you have 'X' against INTRA_HOST_COMMUNICATION, in the modification section.

3.       Further recommendations

The initial investigation tends to highlight issues with network so please proceed as follows:
Collect information regarding network interface configuration:
Command Usage
ifconfig Displays all the active interfaces
ifconfig -a Displays all interfaces active/inactive
ifconfig eth0 Displays specific interface



The output of ifconfig will provide first indication if any packet losses occurred on the network interfaces.
RX packets: 214312154 errors: 0 dropped: 563860 overruns: 0 frame: 0
TX packets: 74772169 errors: 0 dropped: 0 overruns: 0 carrier: 0
RX – Received packets; TX – Transmitted packets
If high packet drops are seen then tcpdump needs to be captured to make a detailed investigation.
The RX and TX lines show how many packets have been received or transmitted error free, how many errors occurred, how many packets were dropped (probably because of low memory) and how many were lost because of an overrun. Receiver overruns usually occur when packets come in faster than the kernel can service the last interrupt.
MTU Size means Maximum Transmissible Unit until which a packet can be transmitted from sender to receiver without being fragmented. By default this value is set to 1500, so any packet that is greater than 1500 bytes needs to be fragmented before being able to be forwarded.
On SAP HANA appliances this value is usually set to 9000 (as known as jumbo frames) for network interfaces used by HANA to achieve better performance.
As a final step perform a detailed analysis by capturing tcpdump when reproducing the issue and send the files to SAP.
To perform a detailed network analysis capture tcpdump on all hana nodes and also on the application servers. Please start capturing tcpdump on all nodes and then try to reproduce the issue on the application so that all tcp traffics are captured.
Command for tcpdump
tcpdump -s96 -n -S -p -w /tmp/client_stats.pcap 'tcp and port HANA_listen_port' --> on the client (Application server)
tcpdump -s96 -n -S -p -w /tmp/server_stats.pcap 'tcp' --> on the HANA DB (All nodes)
s – Snapshot length
n – Don’t convert addresses (i.e., host addresses, port numbers, etc.) to names
S – Print absolute, rather than relative, TCP sequence numbers
p – Don't put the interface into promiscuous mode 
w – Write the raw packets to file rather than parsing and printing them out
If you want to capture on specific ports then you can adapt the filter to 'tcp and (port p1 or port p2 or port p3 …)'
If your application server is running on Windows you could use windump to capture tcpdump with the same options. Just make sure that the path exists where the file is being written.
Once tcpdump is captured please forward the files to SAP for further investigation.


Keywords
HANA network, tcpdump, ifconfig, NIPING, Iperf



Header Data

Released On 28.01.2015 16:21:56
Release Status Released to Customer
Component HAN-DB SAP HANA Database
Priority Normal
Category How To

1 comment: