• sudneo@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 hours ago

    Maybe some postmortem analysis will be interesting. The AoC is also a context in which the domain is self-contained and there is probably a ton of training material on similar problems and tasks. I can imagine LLM might do decently there.

    Also there is no big consequence if they don’t and it’s probably possible to bruteforce (which is how many programming tasks have been solved).

    • aleq@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      36 minutes ago

      I think you’re spot on with LLMs being mostly trained on these kinds of tasks. Can’t say I’m an expert in how to build a training set, but I imagine it’s quite easy to do with these kinds of problems because it’s easy to classify a solution as correct or incorrect. This is in contrast to larger problems which are less guided by algorithmic efficiency and more by sound design/architecture.

      Still, I think it’s quite impressive. You don’t have to go very far back in time to have top of the line LLMs unable to solve these kinds of problems.

      Also there is no big consequence if they don’t and it’s probably possible to bruteforce (which is how many programming tasks have been solved).

      Usually with AoC part 1 is brute-forceable, but part 2 is not. Very often part 1 is to find the 100th number, and part 2 is to find the 1 000 000 000 000th number or something. Last year, out of curiosity, I had a brute-force solution for one problem that successfully completed on ~90% of the input. Solution was multi-threaded and running on a 16 core CPU for about 20 days before I gave up. But the LLMs this year (not sure if this was a problem last year) are in the top list of fastest users to solve the problems.