An interesting solution

Of cource this problem can be solved using Aho Corasick algorithm. But there is another much more simple, but not determenistic solution. A hash function can be used to compare substrings of the text and the words. First we should find the hash function for all the words. Then we should evaluate the hash function for each substring of the text that has the same length as some word. And finaly comparing this this values of hash functions gives us the answer.

Re: An interesting solution

My implementation of this solution works even faster than implementation of Aho Corasick's algorithm. Using clever structures and technichs we can achieve complexity of O(L*sqrt(K)), where L is the length of the text and K is the total length of all words.

Re: An interesting solution

Posted by

Vlad 24 Mar 2011 15:34

O(L*sqrt(K)) ?

L <= 900k, K<=100k

This means L*sqrt(K) <= 285 mln - not really that fast..

Aho-Corasick is something like O(L + K*alpha) or O( (L+K)log(alpha) ), where alpha - alphabet size; clearly significantly better.

*Edited by author 24.03.2011 15:34*