Dec 06, 2024

Problems I Wrote in 2024

It's already that time of the year. Since it is unlikely I will have any new problems by the end of this year, probably I can post this now. By the way, even though the title is Problems I Wrote in 2024, technically this post contains problems that were published in 2024 i.e. they might have been written before that. Though, for the sake of consistency with previous years' posts, I will just keep the title and post like this.

As I planned, I wrote even fewer problems this year. In total, I wrote around 5 problems, and in one of them I didn't even do much. Among them, I probably like Buggy DFS the most. Though, I also like GCDDCG, perhaps because of how the problem was developed.

As usual, I would talk about those problems. Be it the background story on how the problem was developed, solutions, etc. There is no particular ordering on the problems, I just write them here as I remember them.

CS2040S - Buff! Buff! Buff!

Problem link: Here
Editorial: None

I cannot recall how I wrote this problem... But it should be just along the line of "spam query". Funny thing is while this problem is for the practical exam of CS2040S in Sem1 24/25, I actually finished preparing it before the semester even started. One thing I like about this problem is that the subtasks, in a sense, help to solve the harder subtasks (well, except for the first two subtasks).

Like last year, I consciously set the constraints such that 32-bit integers are sufficient. Though, in hindsight, maybe I should not have allowed BUFF or BUFF_MAX when no cards were affected. Depending on the implementation, allowing these can cause errors. Back then, I allowed these because they are still well-defined, and I don't want to add more constraints which may potentially cause confusion to students. Though, looking at the results, I suspect some students didn't realize these are valid according to the constraints.

CS2040S - Tea Party

Problem link: Here
Editorial: None

Earlier that year, I stumbled upon an interesting-looking paper. From skimming, the paper seems to claim that hash table with linear probing (+ using deleted marker/tombstone smartly) is actually very good. This kinda inspired me to write this problem. You might observe that the problem statement can also be described as, "simulate hash table that uses linear probing!". However, in this problem, as the input is adversarial (i.e. we have to deal with worst-case), the table is large, and we cannot rehash, we cannot just simulate things naively.

Oh yeah, like Buff! Buff! Buff!, I also finished preparing this before the semester even started.

While I typically tried to set the constraints for practical exam problems such that 32-bit integers are sufficient, this time I couldn't for technical reasons. Also, one thing I find amusing about this problem is that there is a partial solution that works nicely if there is no deletion, that I thought "nah, no student will write this". And somehow, the highest-scoring student for this problem wrote that solution...

Trivia: I initially wanted to make the problem statement to be related to Arknights, but in the end, I got too lazy and the only thing inspired by Arknights was the rabbit (no, Amiya is not a donkey).

ICPC APAC 2024 - XOR Operations

Problem link: Here
Editorial: Here

I still have zero idea why Prabowo listed me as a co-author, given that I didn't do much. The only thing I did was conjecture the formula of the answer, along with a stupid implementation that works if $\max(A)$ is not too large. Though maybe the conjecture did help Prabowo, as he was later able to prove that the conjecture is indeed correct. In some sense, the proof was formula-(conjecture)-driven-proof.

So one day, Prabowo just gave me this problem he wrote which he, at that time, had zero idea whether it could be solved or not. Me, being bored, wrote brute force and tried to figure out some explicit formula for the output. After some trial-and-error, I am pretty convinced that the answer must be a $2^k$ for some integer $k$ . After some trial-and-error again, I am pretty convinced that $k$ is $x \cdot (x + 1) / 2 + x \cdot (n-x-1)$ where $x$ is the XOR basis size of $A'$ , in which $A'$ is an array with $N^2$ elements containing all pairwise XOR from $A$ . I sent my brute force code to Prabowo along with the formula I conjectured and then moved on to other stuffs.

Several months later, as I was having lunch with him, Prabowo said that he could prove that the formula was correct. I already forgot his proof, though I recall reading it and for most of the part, convinced of its correctness.

Note that the formula conjectured above can be easily calculated if $\max(A)$ is quite small. For example, when $\max(A) \le 10^5$ , $A'$ can be somewhat computed in $O(\max(A) \log \max(A))$ time using FWHT. Hence, if $\max(A) \le 10^5$ , then I wouldn't like this problem as it boils down to guessing. Luckily, from his proof, Prabowo can solve the problem in like, $O(n \log \max(A))$ time, which doesn't seem trivial to me. At least, I cannot immediately figure out why the solution is correct. Or maybe it's just a skill issue, I don't know.

By the way, at this point you might notice "this problem can't be written in 2024", as ICPC APAC 2024 was held around February or March, and no way the committee like to live that dangerously. Yes, technically this problem was written in 2023.

ICPC Jakarta 2024 - Buggy DFS

Problem link: Here Editorial: Here

One of the perks of teaching Data Structures & Algorithms course for undergrad is that you get exposed to $10^9+7$ buggy implementations of classical algorithms. While technically I have not seen any student write something like in this problem, I would still say that this problem is mostly inspired by my teaching experience.

Originally, the title of the problem was just BDFS, where it stands for breadth-depth-first-search. Yeah, the buggy code is actually a buggy BFS made more buggy by replacing the queue with a stack, which, somehow, makes the code really close to a working DFS. I felt that the original statement is funnier as then the editorial can mention "yeah, the B actually stands for buggy".

To be honest, I don't really like the editorial of this problem. It tells the reader how to solve the problem, but not why it can be solved that way. Like, the graph construction seems to pop out of nowhere. Note: I was not involved in any preparation of ICPC Jakarta this year (apart from proofreading my problems' description).

So let me go through a bit on how I worked on this problem.

The first thing I did was trying to simplify the BDFS function. My first attempt, which was also mentioned in Codeforces:

ModifiedBDFS():
    let S be a stack
    let flag be a boolean array of size N
    let counter be an integer

    S := {1}
    flag := initialize all with false
    counter := 0

    while S is not empty:
        v := S.pop()
        counter := counter + #neighbors_of_v
        if flag[v] is true:
            continue

        flag[v] := true
        for each u neighbor of v in **ascending** order:
            if flag[u] is false:
                S.push(u)
    
    return counter

Proof Sketch: Suppose a vertex $v$ is visited more than once. Note that after the first visitation of $v$ , it cannot be pushed to the stack anymore. However, from the first visitation of $v$ , all of its neighbors must have been pushed to the stack and therefore, visited before the next visitation of $v$ . Hence, from the second visitation onwards, we can just increment the counter with the degree of $v$ .

Note that this code is already an iterative implementation of DFS where we iterate the neighbors in descending order. If not sure, try to convince yourself the following recursive implementation computes the same thing.

DFS(v):
    flag[v] := 1
    counter := 0
    for each u neighbor of v in **descending** order:
        if flag[u] != 1:
            counter := counter + #neighbors_of_u
        if flag[u] = 0:
            counter := counter + DFS(u)
    flag[v] := 2
    return counter

return DFS(1) + #neighbors_of_1

The fact that the code is actually a DFS later hints on an explicit formula of what is computed in this function.

But before that, what was bugging me the most was "what $K$ is solvable?" Which, like usual, I resort to writing brute-force and finding patterns first. With a large enough graph, I saw that only $K \in \{1, 3, 5, 7, 9\}$ are unsolvable. Then, it hit me that if there is a graph in which the function outputs $X$ and there is a graph in which the function outputs $Y$ , then there is a graph in which the function outputs $X+Y$ . This is because we can "merge" those two graphs, e.g. to obtain a graph in which the function outputs $13$ , we can combine those with output $2$ and $11$ as shown below

            ---------         ---------
            |       |         |       |
1-2    +    1 - 4 - 3 - 2  =  1 - 4 - 3 - 2
                              |
                              --- 5

Since the smallest even solvable $K$ is $2$ and the smallest odd solvable $K$ is $11$ , if we ignore the limit on graph size, $-1 -1$ only happens when $K \in \{1, 3, 5, 7, 9\}$ .

Now, back to the construction. From the DFS code, we can actually derive the explicit formula to be $deg(1) + \sum_{v=2}^{N} in(v) \cdot deg(v)$ , where $in(v)$ is the number of the $v$ 's ancestor $u$ in the DFS tree produced by the DFS such that there is an edge between $u$ and $v$ , and $deg(v)$ is the number of $v$ 's neighbor. In particular, $in(v)$ denotes how many times $v$ is visited.

And then from the formula, it is clear that if we want to maximize the output, we need some vertices which $in(v) \cdot deg(v)$ is large. And that is where the graph in the editorial comes in. We can see that the DFS tree of the graph will be a chain, where the "leaf" i.e. vertex $2$ has a large $in(v)$ and $deg(v)$ .

Using every observation we have so far, it is not hard to devise a greedy solution that requires $O(\sqrt{K})$ edges and vertices, which also satisfies the limitations from the problem.

Note: in my proposal, my construction is a bit more expensive. For whatever reason, I attach additional vertices to vertex $2$ so the DFS tree looks like a broom rather than a chain. Then the scientific committee found a better graph as shown in the editorial.

Note 2: I initially allowed the graph to have $100\;000$ vertices and $200\;000$ edges, but then the scientific committee lowered the limit. I have no strong opinion on this decision.

Note 3: I still have no idea why the smallest solvable odd $K$ is $11$ .

ICPC Jakarta 2024 - GCDDCG

Problem link: Here
Editorial: Here

Perhaps the biggest offender to the title of this post. When do you think this problem was originally written?

2024? 2023? 2022?

No, it was written way back in 2016. In fact, this was one of the earliest problems I wrote. IIRC, this was written around the time I wrote this problem.

Okay, I wrote this in 2016. So why does it take so long to appear in a contest? Short story: skill issue.

Back in 2016, when I formulated this problem, my initial thought was "probably I can play around with how we do inclusion-exclusion in counting problems involving GCD". Which, as you can imagine, does not work that easily. After spending some time, I still could not solve it, and I gave up. Back then, I stored all my ideas locally on my laptop.

Fast forward to 2023, I haven't been using the aforementioned laptop for several years (I bought a new one when I started my PhD). As my family is moving, my sister asked whether I want to keep that laptop which has been collecting dust in our old home. Nope, so I booted it up for the last time and backed up any important stuff there. There is almost none, but then I remember I used to have a local note for problem ideas. I found it, and then put its content to my current notes. After that, so long my old laptop.

Then, I was having vacation, and I was bored. I opened up my notes, and then I remember I've written this problem before. I thought, "I got nothing to do anyway, so let's try this one again".

While I have learned a bunch of new stuffs, it does not mean this problem has become easy. I started with the obvious $O(N^3 \log N)$ DP. After giving up on using a "one-dimensional" approach, I started with "two-dimensional" approach, and managed to solve it in $O(N^2 \log^2 N)$ . Then, with a new observation (that I felt surprising at the time), I can reduce it to $O(N^2 \log N)$ . With some silly optimization, I can reduce it further to $O(N^2)$ .

At this point, I felt a bit happy, but it seemed off. Can't it be solved in $o(N^2)$ time? I then proceed to bash the formula, breaking it into several pieces. Most of the pieces are easy until I am left with the most tedious part, that looks like

\sum_{i=1}^{N} \sum_{j=1}^{N} \mu(i, j) \cdot 2^{c_i - c_{LCM(i, j)}}2^{c_j - c_{LCM(i, j)}}3^{c_{LCM(i, j)}}

After that, I felt lost. My vacation is over, so I went back to Singapore. Then, the following morning after I went back, as I showered

> "Okay I can somewhat solve this fast if all of the LCMs are at most N"
> "But what to do with those with LCM greater than N"
> "Hmm their c must be 0"
> "Wait, can't we use whatever we have counted from those with LCMs at most N?"

So with the help of shower-kun, I somehow managed to solve the tedious part. I quickly finished showering, wrote the idea, and was pretty confident that it would work. I wrote the code, tested it against the naive solutions, and they always match. Furthermore, it can be proven that the time complexity is $O(N \log^2 N)$ !

So yeah, if I want to exaggerate, it took me 7 years to solve this problem. The thinking process is somewhat captured in my problem proposal (I modified the file a bit to remove additional information in the proposal). Note that while the time complexity is a bit different from the one done in the editorial by the scientific committee of ICPC Jakarta 2024, I think it is just a matter of implementation. From what I skimmed it looks like the general idea is the same.

Note: If you are wondering why even though I solved this in 2023 it appeared at the end of 2024, it's because initially I proposed it somewhere and got rejected.

Note 2: My biggest concern with this problem is that the problem formulation seems very simple, that it might have appeared within the last 7 years. Hopefully, judging from the official and Codeforces scoreboard, this is not the case.

Afterwords

I mentioned last year that I'm not planning to write any new problems, and this might continue for an indefinite time. While I did write a new problem (e.g. Buggy DFS), it was completely unplanned beforehand. Moreover, now I am pretty sure everything in my notes has been published or simply bad. Therefore, most likely there won't be Problems I Wrote in 2025.

On the other hand, I am happy that I saw a younger name in the scientific committee of ICPC Jakarta (i.e. Pikatan). Hopefully, there will be more new, young members in the scientific committee in future iterations.

← For a Long Time, I Misunderstood Amortized Analysis