Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 8.21 KB

alarms.md

File metadata and controls

14 lines (12 loc) · 8.21 KB

Alarms

When in a production environment, the STH component is typically alarmed using the following alarms (we also include some guidelines regarding how to react if they arise):

Alarm ID Severity Detection strategy Stop condition Description Action
1 CRITICAL lvl=ERROR + corr=NA + trans=NA + op=OPER_STH_SHUTDOWN in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Error when connecting to MongoDB or error when starting the Hapi server or any uncaught exception. 1. Check the logs to infer the concrete error
2. If error when connecting to MongoDB:
   2.1. Check the MongoDB instance or replica-set is running. If not, start it up.
   2.2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. If any other error:
   3.1. Restart the STH server.
   3.2. Contact the development team to inform them about this error.
2 CRITICAL lvl=ERROR + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Internal Hapi server error. 1. Restart the STH server.
2. Contact the development team to inform them about this error.
3 WARNING lvl=ERROR and msg=Error when getting data from collection in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing in the log messages. Error when getting raw or aggregated data from a MongoDB collection. 1. Check the MongoDB instance or replica-set is running. If not, start it up.
2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. Contact the development team to inform them about this error.
4 WARNING lvl=ERROR and msg=Error when getting the collection in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Error when getting the collection in MongoDB from which the raw or aggregated data should be retrieved. 1. Check the MongoDB instance or replica-set is running. If not, start it up.
2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. The problem could be related to the limitation MongoDB imposes on the namespaces maximum size (for further information, see: limits, for the concrete MongoDB instance version)
4. Contact the development team to inform them about this error.
5 WARNING lvl=ERROR and msg=Error when storing the raw data associated to a notification event in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Error when storing raw data in the corresponding MongoDB collection. 1. Check the MongoDB instance or replica-set is running. If not, start it up.
2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. Contact the development team to inform them about this error.
6 WARNING lvl=ERROR and msg=Error when storing the aggregated data associated to a notification event in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Error when storing aggregated data in the corresponding MongoDB collection. 1. Check the MongoDB instance or replica-set is running. If not, start it up.
2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. Contact the development team to inform them about this error.
7 WARNING lvl=ERROR and msg=Error when creating the index for TTL for collection in the log messages. lvl=INFO + corr=NA + trans=NA + op=OPER_STH_SERVER_LOG and msg containing 'Everything OK' in the log messages. Error when creating the index to force TTL in the newly created collection. 1. Check the MongoDB instance or replica-set is running. If not, start it up.
2. Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
3. Contact the development team to inform them about this error.