ENG  RUSTimus Online Judge
Online Judge
Problems
Authors
Online contests
About Online Judge
Frequently asked questions
Site news
Webboard
Links
Problem set
Submit solution
Judge status
Guide
Register
Update your info
Authors ranklist
Current contest
Scheduled contests
Past contests
Rules
back to board

Discussion of Problem 1269. Obscene Words Filter

Help with aho corasick MLE 7,please !
Posted by uu_Innk 8 May 2011 09:54
#include <iostream>
#include <queue>
#include <map>
using namespace std;
#define MAX 2000000
#define K 128

    map<int,int> Next;

    short link[MAX];
    bool leaf[MAX];
    short p[MAX];
    char I[MAX];

int sz,n,m,vv=0;
unsigned char ch;
char chh;
queue<short> q;

int main(){
    p[0] = 0;
    sz = 1;
    scanf("%d",&n);getchar();
    while(n--){
        int v=0;
        while((ch=getchar())!='\n')    {
            if (Next[v*1000+ch] == 0){
                p[sz] = v;
                I[sz] = ch;
                Next[v*1000+ch] = sz++;
            }
            v = Next[v*1000+ch];
        }
        leaf[v] = true;
    }

~8400Kb

Wrong bor ??
Re: Help with aho corasick MLE 7,please !
Posted by AterLux 8 May 2011 12:49
I can't understand your Aho-Corasick realistation.
Why v * 1000, is ch only 32 thru 255 ?
then, why max 2000000? Are you sure, that your FSM will never have more than 2000 states?
Re: Help with aho corasick MLE 7,please !
Posted by uu_Innk 9 May 2011 15:50
map<int,int> Next;

I use 'int' 4-8 digit number under the peaks, and 1-3 digit number under the symbol (for which there is a transition)

{я использую в 'int'  4-8 разряд под номер вершины, а 1-3 разряд под номер символа (по которому есть переход) }

So looks like the transition to a tree:

int go (int g,unsigned char ch) {
    ///
        ///
    return Next[g*1000+ch];
}

Edited by author 09.05.2011 15:50
Re: Help with aho corasick MLE 7,please !
Posted by AterLux 9 May 2011 20:38
That wrong way: imagine dictonary

ablahblahblahblah... (10000 long)
bblahblahblahblah... (10000 long)
...
jblahblahblahblah....(10000 long)

so you will have 100000 nodes in your tree. If you multiply it by 1000 (or even by 256-32=224), you will have 100000000 (or 22400000) entries, that cann't fit in memory limit.

so you have to store link-symbols in other way. For example, you can imagine, that evry node can be reached only by one symbol, so you can store symbol in your node and make something like linked list of child-nodes

root
   |
node(a) -- node(b) -- ... -- node(n)
   |
nodelevel2(a) -- ....

that will require only, letter, child-link, sibling-link, and fault-function value.
So, may be you will wish to store queue-link field in each note (that will help you to build Aho-Corasick FSM) - that total less than 24 bytes in each of 100000 nodes

Other way: to store for each node dynamic array of used letters and corresonded child-nodes.