Chapter 5. Monitoring VoltDB Databases

Documentation

VoltDB Home » Documentation » Administrator's Guide

Chapter 5. Monitoring VoltDB Databases

Monitoring is an important aspect of systems administration. This is true of both databases and the infrastructure they run on. The goals for database monitoring include ensuring the database meets its expected performance target as well as identifying and resolving any unexpected changes or infrastructure events (such as server failure or network outage) that can impact the database. This chapter explains:

  • How to monitor overall database health and performance using VoltDB

  • How to automatically pause the database when resource limits are exceeded

  • How to integrate VoltDB monitoring with Prometheus

5.1. Monitoring Overall Database Activity

VoltDB provides several tools for monitoring overall database activity. The following sections describe the three primary monitoring tools within VoltDB:

5.1.1. Volt Management Center

Volt Management Center (VMC) is a browser-based management tool for monitoring, examining, and querying a running VoltDB database. The Management Center is packaged as part of the Volt VMC service that is available separately from the server software. To use VMC, download and unpack the VMC service. You can then start the service using the bin/vmc command either accepting the default server address (if it is running on a Volt database server node) or specifying the server address on the command line. For example:

$ tar -zxf voltdb-vmc-x.x.x.tar -C $HOME
$ cd $HOME
$ mv voltdb-vmc-x.x.x vmc
$ vmc/bin/vmc --servers={volt-server-IP-address}

Once the service is running, you can access VMC from any web browser using the service's IP address and port 8080:

http://voltsvc:8080/

The Volt Management Center provides a graphical display of key aspects of database performance, including throughput, memory usage, query latency, and partition usage. You can also use the Management Center to examine the database schema and to issue ad hoc SQL queries.

5.1.2. System Procedures

VoltDB provides callable system procedures that return detailed information about the usage and performance of the database. In particular, the @Statistics system procedure provides a wide variety of information depending on the selector keyword you give it. Some selectors that are particularly useful for monitoring include the following:

  • MEMORY — Provides statistics about memory usage for each node in the cluster. Information includes the resident set size (RSS) for the server process, the Java heap size, heap usage, available heap memory, and more. This selector provides the type of information displayed by the Process Memory Report, except that it returns information for all nodes of the cluster in a single call.

  • PROCEDUREPROFILE — Summarizes the performance of individual stored procedures. Information includes the minimum, maximum, and average execution time as well as the number of invocations, failures, and so on. The information is summarized from across the cluster as whole. This selector returns information similar to the latency graph in Volt Management Center.

  • TABLE ​— Provides information about the size, in number of tuples and amount of memory consumed, for each table in the database. The information is segmented by server and partition, so you can use it to report the total size of the database contents or to evaluate the relative distribution of data across the servers in the cluster.

When using the @Statistics system procedure with the PROCEDUREPROFILE selector for monitoring, it is a good idea to set the second parameter of the call to "1" so each call returns information since the last call. In other words, statistics for the interval since the last call. Otherwise, if the second parameter is "0", the procedure returns information since the database started and the aggregate results for minimum, maximum, and average execution time will have little meaning.

When calling @Statistics with the MEMORY or TABLE selectors, you can set the second parameter to "0" since the results are always a snapshot of the memory usage and table volume at the time of the call. For example, the following Python script uses @Statistics with the MEMORY and PROCEDUREPROFILE selectors to check for memory usage and latency exceeding certain limits. Note that the call to @Statistics uses a second parameter of 1 for the PROCEDUREPROFILE call and a parameter value of 0 for the MEMORY call.

import sys
from voltdbclient import *

nano = 1000000000.0
memorytrigger = 4 * (1024*1024)     # 4gbytes
avglatencytrigger = .01 * nano      # 10 milliseconds
maxlatencytrigger = 2 * nano        # 2 seconds

server = "localhost"
if (len(sys.argv) > 1): server = sys.argv[1]

client = FastSerializer(server, 21212)
stats = VoltProcedure( client, "@Statistics", 
   [ FastSerializer.VOLTTYPE_STRING, 
     FastSerializer.VOLTTYPE_INTEGER ] )

# Check memory
response = stats.call([ "memory", 0 ])
for t in response.tables:
   for row in t.tuples:
      print 'RSS for node ' + row[2] + "=" + str(row[3])
      if (row[3] > memorytrigger):
         print "WARNING: memory usage exceeds limit."

# Check latency
response = stats.call([ "procedureprofile", 1 ])
avglatency = 0
maxlatency = 0
for t in response.tables:
   for row in t.tuples:
      if (avglatency < row[4]): avglatency = row[4]
      if (maxlatency < row[6]): maxlatency = row[6]
print 'Average latency= ' + str(avglatency) 
print 'Maximum latency= ' + str(maxlatency)
if (avglatency > avglatencytrigger):
   print "WARNING: Average latency exceeds limit."
if (maxlatency > maxlatencytrigger):
   print "WARNING: Maximum latency exceeds limit."

client.close()

The @Statistics system procedure is the the source for many of the monitoring options discussed in this chapter. Two other system procedures, @SystemCatalog and @SystemInformation, provide general information about the database schema and cluster configuration respectively and can be used in monitoring as well.

The system procedures are useful for monitoring because they let you customize your reporting to whatever level of detail you wish. The other advantage is that you can automate the monitoring through scripts or client applications that call the system procedures. The downside, of course, is that you must design and create such scripts yourself. As an alternative for custom monitoring, you can consider integrating VoltDB with existing third party monitoring applications, as described in Section 5.3, “Integrating VoltDB with Prometheus”. You can also set the database to automatically pause if certain system resources run low, as described in the next section.

5.1.3. SNMP Alerts

In addition to monitoring database activity on a "as needed" basis, you can enable VoltDB to proactively send Simple Network Management Protocol (SNMP) alerts whenever important events occur within the cluster. SNMP is a standard for how SNMP agents send messages (known as "traps") to management servers or "management stations".

SNMP is a lightweight protocol. SNMP traps are sent as UDP broadcast messages in a standard format that is readable by SNMP management stations. Since they are broadcast messages, the sending agent does not wait for a confirmation or response. And it does not matter, to the sender, whether there is a management server listening to receive the message or not. You can use any SNMP-compliant management server to receive and take action based on the traps.

When you enable SNMP in the configuration, VoltDB operates as an SNMP agent sending traps whenever management changes occur in the cluster. You enable SNMP with deployment.snmp and its subproperties. You configure how and where VoltDB sends SNMP traps using one or more of the properties listed in Table 5.1, “SNMP Configuration Properties”.

Table 5.1. SNMP Configuration Properties

PropertyDefault ValueDescription
deployment.snmp.target(none)Specifies the IP address or host name of the SNMP management station where traps will be sent in the form {IP-or-hostname}[:port-number]. If you do not specify a port number, the default is 162. The target property is required.
deployment.snmp.communitypublicSpecifies the name of the "community" the VoltDB agent belongs to.
deployment.snmp.username(none)Specifies the username for SNMP V3 authentication. If you do not specify a username, VoltDB sends traps in SNMP V2c format. If you specify a username, VoltDB uses SNMP V3 and the following properties let you configure the authentication mechanisms used.
deployment.snmp.authprotocol

SHA
(SNMP V3 only)

Specifies the authentication protocol for SNMP V3. Allowable options are:

  • SHA

  • MD5

  • NoAuth

deployment.snmp.authkey

voltdbauthkey
(SNMP V3 only)

Specifies the authentication key for SNMP V3 when the protocol is other than NoAuth.
deployment.snmp.privacyprotocol

AES
(SNMP V3 only)

Specifies the privacy protocol for SNMP V3. Allowable options are:

deployment.snmp.privacykey

voltdbprivacykey
(SNMP V3 only)

Specifies the privacy key for SNMP V3 when the privacy protocol is other than NoPriv.

[*] Use of 3DES, AES192, or AES256 privacy requires the Java Cryptography Extension (JCE) be installed on the system. The JCE is specific to the version of Java you are running. See the the Java web site for details.


SNMP is enabled when you set the deployment.snmp.enabled property to true. For example, the following configuration enables SNMP alerts, sending traps to mgtsvr.mycompany.com using SNMP V3 with the username "voltdb":

deployment:
  snmp:
    enabled: true
    target: mgtsvr.mycompany.com
    username: voltdb

Once SNMP is enabled, VoltDB sends alerts for the events listed in Table 5.2, “SNMP Events”.

Table 5.2. SNMP Events

NameSeverityDescription
crashFATALWhen a server or cluster crashes.
clusterPausedINFOWhen the cluster pauses and enters admin mode.
clusterResumeINFOWhen the cluster exits admin mode and resumes normal operation.
hostDownERRORWhen a server shuts down or is recognized as having left the cluster.
hostUpINFOWhen a server joins the cluster.
streamBlockedWARNWhen an export stream is blocked due to data missing from the export queue and all cluster nodes are running.
statisticsTriggerWARN

When certain operational states are compromised. Specifically:

  • When a K-safe cluster loses one or more nodes

  • When using database replication, the connection to the remote cluster is broken

resourceTriggerWARN

When certain resource limits are exceeded. Specifically

  • Memory usage

  • Disk usage

See Section 5.2, “Setting the Database to Read-Only Mode When System Resources Run Low” for more information about configuring SNMP alerts for resources.

resourceClearINFOWhen resource limits return to levels below the trigger value.

For the latest details about each event trap, see the VoltDB SNMP Management Information Base (MIB), which is installed with the VoltDB server software in the file /tools/snmp/VOLTDB-MIB in the installation directory.