I\'m preparing a presentation on prefix computation (aka scan, the generalizatio
ID: 649697 • Letter: I
Question
I'm preparing a presentation on prefix computation (aka scan, the generalization of prefix summation to any associative operator) for a class I'm taking on parallel algorithms.
Several lists of applications of prefix computation include lexical analysis (tokenizing), but do not cite a source for that. I have failed to find any other reference to it being used that way.
Where has any description of the use of a prefix computation for the purpose or lexical analysis been published? (Or, what was the origin of this misinformation?)
Explanation / Answer
The important fact in this case is that the list append operation is associative.
Hadoop has "tokenize" as one of its built-in parallel operators. The meaning of "tokenize" in this case is much less general than "lexical analysis". They mean "splitting a string at a specific well-defined set of `delimeter' characters." I doubt there is much you could do with a more general lexical analyzer (one where you are running an arbitrary finite-automata over a string.)
So suppose you need to split a string at all the space characters. You are essentially creating a list of indices into the string. The list append operation is associative. So each processor generates the appropriate list of indices for the substring assigned to it, and then you append all the lists together.