Common BoardMake two separate arrays for x and y coordinates of cockroaches. Don't use horizontal instructions in tight loops. Perform all the develop and debug in 32-bit mode (evidently GCC/Clang on server has -m32 key). x86-64 has RIP-relative addressing, there would be bottlneck when you go back to 32-bit build and try to backport from 64-bit one. First loop for each sweet is straitforward. Don't use `displacement(base,index,scale)` addressing. Always run all over the `base` part. Finally it may give about a couple of hundred ms. In second loop for each sweet you may use BSF instruction to get least significant bit offset (remeber, judge runs on Sandy Bridge arch). Use LEA to calculate simple index arithmetic. Here is mine: size_type R = 0; asm ( "vbroadcastsd %[dmin], %%ymm5;" "dloop:" "vcmpgt_oqpd %[d](,%[i],8), %%ymm5, %%ymm6;" "vmovmskpd %%ymm6, %%eax;" "test %%eax, %%eax;" "jz rnext;" "rloop:" "bsf %%eax, %%edx;" "lea 1(%%edx,%[i]), %%edx;" "mov %%edx, %[results](,%[R],4);" "inc %[R];" "lea -1(%%eax), %%edx;" "and %%edx, %%eax;" "jnz rloop;" "rnext:" "add $4, %[i];" "cmp %[M], %[i];" "jl dloop;" : [R]"=b"(R) : [d]"m"(d), [dmin]"m"(dmin), "0"(R), [results]"m"(results), [M]"r"(M), [i]"c"(0) : "cc", "memory", "%eax", "%edx", "%ymm5", "%ymm6" ); Custom IO gives about 300+ ms. I hope M*N solutions finally will loose their ACs someday (at least until hardware will be updated to AVX512-compatible). It turns out, that there are possible two way to solve this problem (both M*N) in addition to naïve one (1400ms). 1.) Read all sites, then for each query point do find all the preliminary neighbours using float coordinates (it is almost two time faster in terms of membory bandwitdth, then if use double). This is a "float" sieve. Then for all sites passed "float" sieve perform similar "double" sieve. It get about 500ms (for me from 1.45s to 950ms). Test #16 is still hardest. Prefetch instructions in long runs gain up to 100ms. First loop (precalc of squared float distances) can be unfolded 4 times by query points (vbroadcastss from fixed memory location works fast in loop, you should not occupy dedicated ymm register for these). Second loop, where squared distances compared against least distance (+ eps of course), found in previous loop, can be unfolded 4 times (you can occupy all 32 bit of some register using bit shift (ror, rol) after vmovmskps). I think it allow to better utilize branch prediction mechanism. This approach can be beaten by simple test: cloud of sites in one corner in 10x10 square and cloud of query points in opposite aslo in 10x10 square both in general position (not matters much). Or wiser: query points placed on quarter of circle with center in one corner and radius of 20000, sites on another quarter of circle with center in the same corner and small radius or vice versa. Or even: two distant strait lines: one with query points and another with sites. 2.) You can solve the problem in single loop using just doubles with randomization. If your random numbers hit right sites, then you can probably make even fastest solution (but it is too improbably). I can't find counterexample for this randomized approach. I am very interested in Progbeat's solution details. It seems (by memory consumed) he used approach very similar to mine. It would be great if we'll exchange our solutions somehow =). 17 1 2 3 4 5 6 7 2 8 9 5 10 8 9 11 3 9 1 2 8 9 1 2 3 9 ?????????????????????????????????????? I suspect rounding errors or something I have WA#9 too. The program seems to be right. Who can help me? I think, that your calculations not so are exact, as that is demanded with a problem. Try to not use intermediate values (for example radius etc.). you have all tests on timustests.4t.com Thank you very much. You were right about my bug. I have just change in my C++ code ceil(x) to ceil(x-eps) and floor(x) to floor(x+eps) and got AC!
Hey, UNKNOWN_LAMER! Do I know you from somewhere? je Edited by author 11.02.2018 04:58 [code deleted] Edited by moderator 19.11.2019 23:40 Hi, Very good solution! Just one note, it doesn't affect AC, but: In D+=(i+N/i); if (i == N/i) you don't need to add both i and N/i, but just i. In this case result of check(N) will be 100% equal to triviality(N) in all cases. Please, check test #26, it seems to be incorrect. Oh no, that test is correct! I got AC. What is a test 6? Whoever gets Wrong Answer at 6th test, try to change your way of finding the power of two. My solution got WA#6 when I calculated power of two this way: power = ceil(log(x)/log(2)); Better calculate it by multiplying it. I remember once going to see him when he was ill at Putney. I had ridden in taxi cab number 1729 and remarked that the number seemed to me rather a dull one, and that I hoped it was not an unfavourable omen. "No," he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways." G.H.Hardy 1729 = 1^3 + 12^3 = 9^3 + 10^3 #include<iostream> #include<cmath> using namespace std; int func(int k, int n, int m) { if(n == 0) return 1; else return func(k, n-1, m)*k%m; } int main() { int n, m, y, x, i = 0; int b; bool flag = true; cin >> n >> m >> y; if((n >0 && n < 999) && (m > 1 && m < 999) && (0 < y && < 999)) { while(i < m) { x = func(n, i, m); if(x == y) { cout << i << ' '; flag = false; } i++; } if(flag) cout << "-1"; } else cout << "-1"; return 0; } Edited by author 08.02.2018 02:25 Edited by author 08.02.2018 02:27 Firstly, the problem asks to find number of edges in minimum edge cover Secondly, the number of such edges plus the number of maximum matching equals to number of vertices, that is N + M Thirdly, you can find the number of max. matching easily with dfs-like algorithm (you can find code online). The answer is N + M - (number of max. matching) I get TLE on 21 test. What can you advise? i think 212 in the Example test is wrong cause we have 2 remix of "I miss you" when the most remix that we can reach is 1 . Edited by author 21.01.2018 23:56 It says: "in a row". It means that there can be any number of remixes of each song, the most important part it that the number in each row doesn't;t have to exceed a for 1st and b for 2nd in: 9 + 10 + 10 + 10 - 10 - 10 + 5 - 10 - 5 + 123 out: 10 10 10 10 10 5 5 1 123 GL. Sqrt decomposition works fine in this problem. N = int(input()) if N == 1: print(1,1) exit() for P in range(round(N**0.5)+1,0,-1): A = (2*N - P**2 + P) / (2*P) if A % 1 == 0: print(int(A),P) exit() This problem isn't hard! Just use Burnside's lemma. One would need Polya enumeration theorem to obtain an explicit formula though. Well, the theorem is a generalization of Burnside's anyway. using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace ConsoleApplication25 { class Program { static void Main(string[] args) { int N = int.Parse(Console.ReadLine()), Sum=0; if (Math.Abs(N)<10000) { while (N != 1) { if (N > 1) { Sum = Sum + N; N--; } else { Sum = Sum + N; N++; } } } Console.WriteLine(Sum+1); } } } I've got Runtime error in Test 2. I've written the code in Java. Could anybody share the test case for the Test2? |
|