First and foremost, is seems to me that there are only two test cases: (1) the sample case (2) an 8MB test case
For WA this test helped me: input: 2 bbaadbdcc aadbdbcbadb aadbdbcbadb bbaadbdcc
output: Case #1: 3 2 5 Case #2: 5 2 3
For TLE: parse the input I optimized all sorts of things but eventually I ran out of ideas. Then I remembered someone else was mentioning he submitted the same code again and passed got rid of TLE. Also, from my own experience, I noticed there was a sort of "randomness" in the time execution from one submission to another. (I was able to detect it by killing my program after processing 6MB of the input file.)
At this point I was seriously considering speeding up the input reading part of my program using custom code instead of stdin.h. I got 760ms for processing 6MB of the input and I said to myself I'm really close: my code should run in 1.040s.
And so I decided to parse the input: I got AC in 560ms! A 480ms boost of speed! So... long story short: parse the input!