(Matt
Fowles is a Senior Software Engineer for StreamBase, whose interests
include programming language, compiler, and virtual machine design.)
As the ponytail (and the tiny bio above) imply, my posts will focus on technical details such as language semantics, general programming issues, compiler design, or anything else I feel is nifty.
This post is the second in a series exploring how new features in a language often impact unexpected corners of the language and can lead to a better whole.
In StreamBase 6.3, to be announced in early April, we added lists as a parameterized data type to StreamSQL, allowing easy representation of data of unknown length:
CREATE SCHEMA offer_details (
symbol string,
num_shares double,
share_price double,
user_id string
);
CREATE SCHEMA order_book (
bids list(offer_details),
asks list(offer_details),
)
It is worth noting that StreamSQL's parameterized data types are more like C++'s than Java's: a parameterized data type is not complete until it has been given a parameter. For example, list(int) or list(list(string)) are both complete types, but list alone does not specify a type. While Java's generics suffer because of their use of type erasure, StreamBase's parameterized types provide strong, static typechecking.
One of StreamBase's selling points is the ease of extending the StreamBase Expression Language with custom functions written in Java or C++. When integrating with Java, we automatically map between StreamBase types and their Java equivalent (e.g. StreamBase int to Java int or StreamBase string to Java String) for both arguments to and returns types of custom functions.
Obviously, the natural representation for a list(int) or list(string) in Java is a List<Integer> or List<String> respectively; unfortunately, at the JVM level these types don't exists. Both types become merely List after type erasure. Thus, we cannot automatically infer the type of an argument or return for parameterized data types. We could require users to explicitly declare the types of their custom functions in the configuration file, but that is a bit inelegant (and also prevents users from writing a single Java function that can take several different list types like list(int), list(long), list(double)).
After some discussion, we decided that the users needed a more direct hook into the typechecking process for their functions. Thus, we added the concept of a function resolver. This is probably easiest to describe with an example:
class ExprUtil {
@CustomFunctionResolver("concatenateResolver")
public static List<?> concatenate(List<?> v1, List<?> v2) {
List<?> res = new ArrayList<?>();
if (v1 != null) { res.addAll(v1); }
if (v2 != null) { res.addAll(v2); }
return res;
}
public static CompleteDataType concatenateResolver(
CompleteDataType arg1, CompleteDataType arg2) {
if (arg1.getElementType() == null) {
return null; // not a list
}
if (arg1.equals(arg2)) { return arg1; }
return null;
}
}
The concatenate function above will take 2 lists (provided they are the same type) and return a list of that type. We use the @CustomFunctionResolver to annotate the method with just such a resolver, and everything else is taken care of. During typecheck time, the resolver will be called as necessary to determine whether a function is a legitimate candidate. Then at runtime, the concatenate function will be called appropriately. One can, of course, do more complicated manipulation of the types involved if inclined:
class ExprUtil {
@CustomFunctionResolver("flattenResolver")
public static List<?> flatten(List<List<?>> top) {
List<?> res = new ArrayList<?>();
if (top != null) {
for (List<?> sub : top) {
if (sub != null) { res.addAll(sub); }
}
}
return res;
}
public static CompleteDataType flattenResolver(
CompleteDataType arg) {
CompleteDataType inner = arg.getElementType();
if (inner == null) {
return null; // arg not a list
}
if (inner.getElementType() != null) {
return null; // arg not a list of lists
}
return inner;
}
}
As an added bonus, while implementing this feature, we realized that hierarchical data is just another parameterized data type. Prior to adding type resolvers we didn't allow tuples as arguments to custom functions. But now that we have a simple, clean API for interfacing parameterized types between Java and StreamSQL, we can (and do).