A formatter is a module that takes a row of data received by an import connector, interprets the contents, and translates it into individual column values. The default formatter that is provided with VoltDB parses comma-separated values (CSV) data. However, if the data you are importing is in a different format, you can write a custom formatter to perform this translation step.
You provide a custom formatter as an OSGi (Open Service Gateway Initiative) bundle. However, much of the standard work of an OSGi bundle is handled by the VoltDB import framework. So you only need to provide selected components as described in the following sections.
Custom formatters can be used with both custom and built-in import connectors and with the standalone kafkaloader utility.
The following sections describe:
The structure of the custom formatter
Compiling and packaging custom formatter bundles
Installing and invoking custom formatters
Using custom formatters with the kafkaloader utility
The custom formatter must contain at least two Java classes: one that implements the
org.voltdb.importer.formatter.Formatter
interface and one that extends the
org.voltdb.importer.formatter.AbstractFormatterFactory
interface.
For the sake of example, let's assume the custom formatter classes are called MyFormatter and MyFormatterFactory. When the associated import connector is initialized, the VoltDB importer infrastructure calls the classes' methods in the following order:
MyFormatterFactory.create
() is called once, to initialize the formatter. The
create
method must return an instance of the MyFormatter class.
MyFormatter.MyFormatter()
is invoked once when an instance of the MyFormatter class is
initialized in the preceding step.
MyFormatter.transform()
is called from the import connector every time it retrieves a record
from the data source.
In many cases, the easiest way to create custom class is to modify an existing example. And VoltDB provides an example formatter that you can use as a base for your customizations in the VoltDB github at the following URL:
The next sections describe how to modify this example — or how to create a custom formatter from scratch, if you wish.
You must create a class that extends the AbstractFormatterFactory class. However, within that class all you need to change is overriding the create() method to return an instance of your implementation of the Formatter interface. So, assuming the new class names use the prefix "MyFormatter" and using the example formatter provided in github, all you need to modify are the items highlighted in the following example:
package myformatter; import org.voltdb.importer.formatter.AbstractFormatterFactory; public class MyFormatterFactory extends AbstractFormatterFactory { /** * Creates and returns the formatter object. */ @Override public MyFormatter create() { MyFormatter formatter = new MyFormatter(m_formatName, m_formatProps); return formatter; } }
The bulk of the work of a custom formatter occurs in the class that implements the Formatter interface. Within
that class, you must have at least one method that overrides the default transform()
method. You can,
optionally, include a method that initializes the class and handles any properties that need to be passed into the
formatter from the import configuration.
The method that initializes the class has the same name as the class (in our example, MyFormatter). The method
accepts two parameters: a string and a list of properties. The string contains the name of the formatter as specified
in the database configuration file (see Section 7.3.3.2, “Configuring and Invoking Custom Formatters”). This string will, by definition, match
the name of the class itself. The second parameter is a collection of Java Property objects representing properties
set in the configuration file using the <format-property>
element and certain
VoltDB built-in properties, whose names all start with two underscores.
If the custom formatter doesn't require any information from the configuration, you do not need to include this
method. However, if your formatter does require additional information, this class can retrieve and store information
provided in the import configuration. For example, the MyFormatter()
method in the following
implementation looks for a "column_width" property and stores it for later use by the transform()
method:
package myformatter;
import java.util.Properties;
import org.voltdb.importer.formatter.FormatException;
import org.voltdb.importer.formatter.Formatter;
public class MyFormatter implements Formatter {
String column_width = "";
MyFormatter (String formatName, Properties prop) {
column_width = prop.getProperty("column_width");
}
The method that does the actual work of formatting the incoming data is the transform() method. This method receives the incoming data as a Java byte buffer and is expected to return an array of Java objects representing the input parameters, which will be passed to the specified stored procedure to insert the data into the database.
For example, If the custom formatter expects data in fixed-width columns, the method might look like this:
@Override public Object[] transform(ByteBuffer payload) throws FormatException { String buffer = new String(payload.array()); ArrayList<Object> list = new ArrayList<Object>(); int position = 0; while (position < buffer.length()) { int endpoint = Math.min(position+column_width, buffer.length()); list.add(buffer.substring(position,endpoint)); position += column_width; } return list.toArray(); }
Once the custom formatter source code is complete, you are ready to compile and package the formatter as an OSGi bundle.
When compiling the source code, be sure to include the VoltDB JAR files in the Java classpath. For example, if
VoltDB is installed in the folder /opt/voltdb
, you will need to include
/opt/voltdb/voltdb/*
and /opt/voltdb/lib/*
in the classpath.
You will also need to include a number of OSGi-specific attributes in the final JAR file manifest. For example, you
must include the Bundle-Activator attribute pointing to the FormatterFactory class. To ensure
all the necessary properties are set, it is easiest to use the ant utility and an ant build file. The following is an
example build.xml
file, with the items that you must modify highlighted in bold text:
<project default="build"> <path id='project.classpath'> <!-- Replace this with the path to the VoltDB jars --> <fileset dir='/opt/voltdb'> <include name='voltdb/*.jar' /> <include name='lib/*.jar' /> </fileset> </path> <target name="build" depends="clean, dist, formatter"/> <target name="clean"> <delete dir="obj"/> <delete file="myformatter.jar"/> </target> <target name="dist"> <mkdir dir="obj"/> <javac srcdir="src" destdir="obj"> <classpath refid="project.classpath"/> </javac> </target> <target name="formatter"> <jar destfile="myformatter.jar" basedir="obj"> <include name="myformatter/MyFormatter.class"/> <include name="myformatter/MyFormatterFactory.class"/> <manifest> <attribute name="Bundle-Activator" value="myformatter.MyFormatterFactory" /> <attribute name="Bundle-ManifestVersion" value="2" /> <attribute name="Bundle-Name" value="My Formatter OSGi Bundle" /> <attribute name="Bundle-SymbolicName" value="MyFormatter" /> <attribute name="Bundle-Version" value="1.0.0" /> <attribute name="DynamicImport-Package" value="*" /> </manifest> </jar> </target> </project>
Once you have built and packaged the custom formatter, you are ready to install and use it in your VoltDB infrastructure.
To install the custom formatter, you simply copy the formatter JAR file (in the preceding examples,
myformatter.jar
) to the bundles
folder in the VoltDB installation on every
server in the cluster. For example, if VoltDB is installed in /opt/voltdb
:
$ cp obj/myformatter.jar /opt/voltdb/bundles/
Once the JAR file is available to all VoltDB instances, you can configure and invoke the custom formatter as part of the import configuration. Note that the import configuration can be changed either before the database cluster is started or while the database is running using either the voltadmin update command of the web-based VoltDB Management Center.
You choose the formatter as part of the import configuration using the format
attribute of the <configuration>
element in the database configuration file.
Normally, you use the built-in "csv" format. However, to select a custom formatter, set the format
attribute to the name of the formatter JAR file and its class name. For
example:
<import>
<configuration type="kafka" format="myformatter.jar/MyFormatter" >
[ . . . ]
Storing your custom JAR in the bundles directory is recommended. However, if you choose to keep your custom code
elsewhere, you can still reference it in the configuration by including the absolute path to the file location as part
of the format
attribute. For example, if your JAR file is in the /etc/myapp folder,
the format
attribute value would be "file:/etc/myapp/myformatter.jar/MyFormatter".
The formatter JAR must be in the same location on all nodes of the cluster.
Within the import configuration, you can also include any properties that the formatter needs using the <format-property>
element. For example, in the preceding example, the custom formatter
expects a property called "column_width", so the configuration might look like this:
<import>
<configuration type="kafka" format="myformatter.jar/MyFormatter" >
<property name="brokers">kafka.myorg.org:9092</property>
<property name="topics">customer</property>
<property name="procedure">CUSTOMER.insert</property>
<format-property name="column_width">15</format-property>
</configuration>
<import>
You can also use custom formatters with the standalone kafkaloader utility. To use a custom formatter with kafkaloader you must:
Declare environment variables for FORMATTER_LIB and ZK_LIB
Create a formatter properties file specifying the formatter class and any formatter-specific properties the formatter requires.
The environment variables define the paths to the formatter JAR file and the Apache ZooKeeper libraries, respectively. (Note that ZooKeeper does not need to be running, but you must have a copy of the standard ZooKeeper libraries installed and accessible via the ZK_LIB environment variable.)
The formatter properties file must contain, at a minimum, a "formatter" property that is assigned to the formatter class of the custom formatter. It can contain other properties required by the formatter. The following is the properties file for kafkaloader that matches the example given in the previous section to configure the custom formatter using the built-in importer infrastructure:
formatter=MyFormatter column_width=15
If both your formatter and the ZooKeeper libraries are in a folder myformatter
under your home
directory, along with the preceding properties file, you could start the kafkaloader utility with the following commands
to use the custom formatter:
$ export FORMATTER_LIB="$HOME/myformatter/" $ export ZKLIB="$HOME/myformatter/" $ kafkaloader --formatter=$HOME/myformatter/formatter.config \ --topic=customer --zookeeper=kafkahost:2181