SyntaxHighlighter

10 Sept 2015

Cartesian Products with Kleisli Composition


Over at the JOOQ blog, they have written about How to use Java 8 Functional Programming to Generate an Alphabetic Sequence. By "alphabetic sequence" they mean the (n-order) Cartesian product of the letters in the alphabet. For example, the third-order Cartesian product of the alphabet would be
AAA, AAB, AAC, ..., ZZZ
The author of that post (it is unsigned, but I believe it was written by Lukas Eder) claims that
the Java 8 Stream API does not offer enough functionality for this task.
I will show that this claim is wrong, and that there is in fact a pure Java 8 solution. This solution brings out the structure of the problem rather nicely. It is general in the sense that it uses a general function combinator well-known in functional programming. It relies only on the Java 8 Streams API without needing any constructs from jOOλ.

The first building block of the solution is provided by Misha in his answer to this Stackoverflow question. Here's Misha's function that will create a stream of combinations of its first argument with the given stream, where the mode of combination can be externally specified as well. (In the following I will not only call this method crossJoin, but also the function that it returns.)
<T, U, R> Function<T, Stream<R>> crossJoin(Supplier<Stream<? extends U>> streamSupplier, BiFunction<? super T, ? super U, ? extends R> combiner) {
    return t -> streamSupplier.get().map(u -> combiner.apply(t, u));
}

Now you can do the nth-order Cartesian product by applying flatMap with crossJoin a fixed number of times. A question that goes unanswered in that Stackoverflow discussion is at the core of the JOOQ post: How deal with the situation when the required number of applications is not fixed, but supplied at runtime as a parameter? That is, how do we implement

List<String> cartesian(int n, Collection<String> coll)
Of course, you might do it recursively. But that would rather count as supporting the claim that Java streams were not sufficient! Fortunately, the implementation becomes easy once we remember that the repeated application of functions is equivalent to a single application of the composition of these functions. In our case, we actually want sequential composition, which is just the flip of regular composition. Looking at the signature

crossJoin :: a -> Stream b
it may be hard to see at first how we might compose functions of this signature sequentially. Actually, however, there is guaranteed to be such a combinator. It is called "Kleisli composition", written ">=>", and has the following signature:

>=> :: (a -> Stream b) -> (b -> Stream c) -> (a -> Stream c)

In fact, >=> has a completely general definition for all so-called "monadic types", of which Stream in Java is one. What this means is that we could even abstract over Stream in the above signature, and implement sequential composition for all those types in the same way. This is easier in some languages (Haskell, Scala) than in others (Java). This article by Yoav Abrahami gives an instructive example of what you can do with Kleisli composition in Scala. It demonstrates the same trick we are about to perform, namely replacing recursion with a form of functional composition.

For the sake of example, instead of generalizing we will simplify somewhat. In our case we are dealing only with streams of strings. Here's the Java implementation of >=> for this special case:
BinaryOperator<Function<String,Stream<String>>> kleisli = (f,g) -> s -> f.apply(s).flatMap(g);
The Stream type, being "monadic", obeys certain axioms, one of which states that >=> is associative. So we may use it in a reduction. The required identity function is s -> Stream.of(s). You can easily convince yourself of that by considering the first-order product, which is of course just the original list, which we'll get by lifting every element to Stream and flat-mapping back down, without invoking crossJoin at all. (Of course, you can also prove it by plugging the identity into the definition of >=>.)

Let's put it all together: The idea is to create a stream of as many crossJoin functions as we need, reduce this stream to a single function in memory by composing them together, and finally apply the entire function chain in one fell swoop.

List<String> cartesian(int n, Collection<String> coll) {
    return coll.stream()
           .flatMap( IntStream.range(1, n).boxed()
                     .map(_any -> crossJoin(coll::stream, String::concat)) // create (n-1) appropriate crossJoin instances
                     .reduce(s -> Stream.of(s), kleisli)                   // compose them sequentially with >=>
                    )                                                      // flatMap the stream with the entire function chain
           .collect(toList());
}
The following call will give you a list of all three-letter alphabetic sequences:
cartesian(3, alphabet)
A variation on the above is when you do not want to multiply a collection n times with itself, but with n other collections (all of the same type). Instead of passing in n, you might pass in the sequence of those other collections, and instead of streaming an integer range you might stream that sequence, creating a crossJoin function for each element,  like this: Stream.of(colls).map(c -> crossJoin(c::stream, String::concat))

For good measure, here's how you may construct the list of letters alphabet, if you do not wish to write them down severally with Arrays.asList:
List<String> alphabet = IntStream.rangeClosed('A','Z')
                        .mapToObj(c -> String.valueOf((char) c))
                        .collect(toList());    

[Addendum]:
  1. It's important that we pass a Supplier<Stream> to crossJoin, not the stream itself. 
  2. You might also be interested in Tagir Valeev's answer to a related question on Stackoverflow.

1 comment:

  1. Very cool approach. For the List<Collection> cartesian yielding a List<List>, for the outer stream the first collection must be mapped to a stream of lists containing one of its elements, and the inner stream must discard the first collection.

    ReplyDelete