Saturday, January 30, 2016

1902033 - How to handle HANA Alert 3: ‘Inactive Services’

Symptom
When checking the Alerts tab in HANA, there is an alert called 'Unexpected state STOPPING found when starting StatisticsServer'.
  • In HANA Studio, you would find this alert by going to Administration Console -> Alerts -> Show: all alerts
alert_1.jpg
  • In Solution Manager, you would find this alert using transaction DBACOCKPIT -> choose HANA System -> Expand Current Status -> Alerts
 To check the alert details and the exact time when this error has occurred, please go to HANA Studio -> Administration Console -> Alerts -> double click on the alerts:
alert_2.jpg


Environment
All tests have been performed on HANA 1.0 revision 45


Cause
Following prerequisite is met:
The affected service is not included in the daemon configuration file or the services are set to be restarted automatically according to the configuration.
alert_3.jpg
 Normally there are 3 reasons for this alert:
1. Services have been manually stopped or killed for a specific purpose. The DB administrator could have triggered the HANA alert by stopping or killing services from HANA Studio.
alert_4.jpg
You can check the Daemon Trace File for detailed information. In the trace file, you can find which service is inactive and the reason why it is inactive.
a. Check Daemon Trace File in HANA Studio
alert_5.jpg
b. Check daemon trace file from OS level:
The file is usually located in directory /usr/sap/<SID>/HDB<InstanceNumber>/<hostname>/trace
alert_6.jpg
For example, if trying to kill the nameserver from HANA Studio, we would catch the following information from the daemon trace file:
TrexDaemon.cpp(10226) : process hdbnameserver with pid 43929 exited because it caught signal 9
[43906]{0}[0] 2013-01-28 08:05:10.377704 w Basis        ProcessExecution.cpp(00099) : Active Context before fork ID: 43908 Name: NetworkChannelCompletionThread State: Inactive
[43906]{0}[0] 2013-01-28 08:05:10.377718 w Basis        ProcessExecution.cpp(00099) : Active Context before fork ID: 43909 Name: NetworkChannelCompletionThread State: Inactive
[43906]{0}[0] 2013-01-28 08:05:10.377725 i Daemon       TrexDaemon.cpp(08656) : start 'hdbnameserver' as process 76615
[43906]{0}[0] 2013-01-28 08:05:17.727685 i Daemon       TrexDaemon.cpp(10341) : program hdbnameserver with pid 76615 is started
[43906]{0}[0] 2013-01-28 08:05:17.727720 i Daemon       TrexDaemon.cpp(10355) : runlevel 5 completely started
2. HANA host crashed or restarted due to some unexpected reason.
3. The revision of the HANA DB is not at the latest version.


Resolution
1. If the service has been killed manually for some specific purpose, you can ignore the alert.
2. For any inactive service alert, you need to check the reason why this service was inactive (check the Daemon Trace File to find the reason for the inactive service)
  • Check Daemon Trace File in HANA Studio:
alert_7.jpg
  • Check Daemon Trace File on OS level:
The file is usually located in directory /usr/sap/<SID>/HDB<InstanceNumber>/<hostname>/trace:
alert_6.jpg
Example:
TrexDaemon.cpp(10226) : process hdbnameserver with pid 43929 exited because it caught signal 9
[43906]{0}[0] 2013-01-28 08:05:10.377704 w Basis        ProcessExecution.cpp(00099) : Active Context before fork ID: 43908 Name: NetworkChannelCompletionThread State: Inactive
[43906]{0}[0] 2013-01-28 08:05:10.377718 w Basis        ProcessExecution.cpp(00099) : Active Context before fork ID: 43909 Name: NetworkChannelCompletionThread State: Inactive
[43906]{0}[0] 2013-01-28 08:05:10.377725 i Daemon       TrexDaemon.cpp(08656) : start 'hdbnameserver' as process 76615
[43906]{0}[0] 2013-01-28 08:05:17.727685 i Daemon       TrexDaemon.cpp(10341) : program hdbnameserver with pid 76615 is started
[43906]{0}[0] 2013-01-28 08:05:17.727720 i Daemon       TrexDaemon.cpp(10355) : runlevel 5 completely started
Here you can see that hdbnameserver with old pid 43929 was inactive because it caught signal 9 which means it was killed by kill -9.
3. If the revision of HANA DB is not the latest version, it is strongly recommended to upgrade the HANA DB to the latest version.


See Also
  • Are there any Functional Constraints?
    • During services inactive period, the HANA system would be unavailable to use.
  • Are there any Non-functional Constraints? No
  • Are there any Side-effects? No
  • Is there any suggestion to avoid this alert?
    • For reason 2, Never use kill against HANA processes in production environment.
    • For reason 3, update HANA to the latest available revision.


Keywords

Inactive, kill services, restarted, Operations Recommendation, #OpsRec-HANA

No comments:

Post a Comment