Chapter 8. Creating Custom SQL Functions

Documentation

VoltDB Home » Documentation » Guide to Performance and Customization

Chapter 8. Creating Custom SQL Functions

VoltDB provides many built-in functions for use in SQL statements that are described in an appendix to the Using VoltDB manual. These built-in functions perform a variety of tasks, such as data conversion, string matching, geometric calculations, and so on.

There are two types of SQL functions: scalar functions and aggregate functions. A scalar function takes a set of arguments and produces a single result value. DATEADD() is an example of a scalar function that takes a timestamp, a value, and a unit type such as MINUTE or HOUR, and produces a value that represents the addition of the specified units to the input timestamp. An aggregate functions takes multiple values — a table column or expression — as part of a query and produces a value that aggregates the results from all of the input. MAX() is an example of a built-in aggregate function that calculates the maximum value of all the inputs associated with the query's constraints.

However, not all possible functions are built in and there may be cases where you have an application-specific function that needs to be performed repeatedly. Rather than writing code to perform the operation as part of the application or within a stored procedure, VoltDB lets you create and declare your own functions that can be invoked directly from within SQL queries and data manipulation statements just like built-in functions.

Just as there are two types of built-in functions, you can create two types of user-defined functions. For user-defined scalar functions, you write the function as a Java method that takes the specified input arguments and returns a single value. User-defined aggregate functions are slightly more complex. You create aggregate functions as a Java class with specific methods to initiate the function, collect and process the input values, and return the aggregated result.

In both cases, there are the same three steps to creating a user-defined function:

  1. Write the code to perform the function as a Java method (for scalar functions) or a Java class (for aggregate functions).

  2. Load the Java class that includes the user-defined function into the database.

  3. Declare the function using the CREATE FUNCTION or CREATE AGGREGATE FUNCTION statement, associating it to the Java class and/or method.

The following sections describe how to perform each of these tasks, as well as how to invoke the function in SQL statements once it has been declared.

8.1. Writing a User-Defined Scalar Function

You write user-defined scalar functions as Java methods. If you are creating multiple scalar functions, you can put them all in a single Java class or in separate Java classes. Whichever is most convenient for you and the management of your code base.

The number and datatypes of the method's parameters define the number and types of the function's arguments. For example, if you declare the method as having two parameters, a Java int and String, the function will have two arguments, a VoltDB INTEGER and a VARCHAR. Similarly, the return datatype of the method itself determines the datatype that the function returns.

Because user-defined functions are executable within SQL statements and stored procedures, the methods must obey the same rules concerning determinism as stored procedures. That is, avoid any actions that introduce values that may vary from one system to another, such as system time, random number generation, or I/O with indeterminate results. See the section on determinism in the Using VoltDB manual for details.

For example, say you need to convert distances from imperial or US units to metric measurements. You might define your function with two arguments: a floating-point value representing the measurement and a string unit identifying the units (such as "feet", "yards", or "miles"). So your Java source code would need to declare the method as accepting two parameters: a double and a String. It should also be declared as returning a double value.

package myapp.sql.functions;
import org.voltdb.*;
public class Conversion {
 
  public double us2metric( double value, String units )
                throws VoltAbortException {

Note the method is declared as throwing a VoltAbortException. This is useful for error handling. By throwing a VoltAbortException, it is possible for the function to handle errors gracefully and notify the VoltDB runtime to rollback the current transaction. For example, the first step in the method might be to validate the input and make sure the units argument has a known value:

units = units.toUpperCase().trim();
if (!units.equals("FEET") && 
    !units.equals("YARDS") && 
    !units.equals("MILES") ) 
    throw new VoltAbortException("Unrecognized selector.");

The bulk of the method will focus on performing the actual task the function is designed for. The key point is to make sure it returns the appropriate datatype object that correlates to the VoltDB datatype you want the function to return in the SQL statement. In the previous example, the method is declared as returning a double, which matches the VoltDB FLOAT type. See the appendix on Datatype compatibility in the Using VoltDB manual for details on the mapping of Java and VoltDB datatypes. But, in brief, the mapping of Java to VoltDB datatypes is as follows:

byte or Byte → TINYINT
short or Short → SMALLINT
int or Integer → INTEGER
long or Long → BIGINT
double or Double → FLOAT
BigDecimal → DECIMAL
String → VARCHAR
byte[] or Byte[] → VARBINARY

You can define parameters for VoltDB-specific datatypes by using the object types included in the VoltDB Java packages. For example:

org.voltdb.types.GeographyValue → GEOGRAPHY
org.voltdb.types.GeographyPointValue → GEOGRAPHY_POINT
org.voltdb.types.TimestampType → TIMESTAMP