Help with aho corasick MLE 7,please !
#include <iostream>
#include <queue>
#include <map>
using namespace std;
#define MAX 2000000
#define K 128
map<int,int> Next;
short link[MAX];
bool leaf[MAX];
short p[MAX];
char I[MAX];
int sz,n,m,vv=0;
unsigned char ch;
char chh;
queue<short> q;
int main(){
p[0] = 0;
sz = 1;
scanf("%d",&n);getchar();
while(n){
int v=0;
while((ch=getchar())!='\n') {
if (Next[v*1000+ch] == 0){
p[sz] = v;
I[sz] = ch;
Next[v*1000+ch] = sz++;
}
v = Next[v*1000+ch];
}
leaf[v] = true;
}
~8400Kb
Wrong bor ??
Re: Help with aho corasick MLE 7,please !
I can't understand your AhoCorasick realistation.
Why v * 1000, is ch only 32 thru 255 ?
then, why max 2000000? Are you sure, that your FSM will never have more than 2000 states?
Re: Help with aho corasick MLE 7,please !
map<int,int> Next;
I use 'int' 48 digit number under the peaks, and 13 digit number under the symbol (for which there is a transition)
{я использую в 'int' 48 разряд под номер вершины, а 13 разряд под номер символа (по которому есть переход) }
So looks like the transition to a tree:
int go (int g,unsigned char ch) {
///
///
return Next[g*1000+ch];
}
Edited by author 09.05.2011 15:50
Re: Help with aho corasick MLE 7,please !
That wrong way: imagine dictonary
ablahblahblahblah... (10000 long)
bblahblahblahblah... (10000 long)
...
jblahblahblahblah....(10000 long)
so you will have 100000 nodes in your tree. If you multiply it by 1000 (or even by 25632=224), you will have 100000000 (or 22400000) entries, that cann't fit in memory limit.
so you have to store linksymbols in other way. For example, you can imagine, that evry node can be reached only by one symbol, so you can store symbol in your node and make something like linked list of childnodes
root

node(a)  node(b)  ...  node(n)

nodelevel2(a)  ....
that will require only, letter, childlink, siblinglink, and faultfunction value.
So, may be you will wish to store queuelink field in each note (that will help you to build AhoCorasick FSM)  that total less than 24 bytes in each of 100000 nodes
Other way: to store for each node dynamic array of used letters and corresonded childnodes.