Ben Bales writes:
I saw a presentation from Costa Huang (a graduate researcher working in reinforcement learning) who had some material which I thought would be interesting for the blog. (Full disclosure: Costa and I work at the same company.)
Apparently it can be a bit tricky to reproduce results in reinforcement learning even when code is shared because there’s lots and lots of extra details embedded in the code that aren’t always totally explained in words.
It all just sounded similar to MCMC-land by analogy — a billion tricks floating around, some things work in some places, some things make implementation simpler, some things make models more general, some defaults just seem to work weirdly well — hard to keep track of it all!
Here is a blog they worked on to document and add citations for all these tips-n-tricks/implementation-details that go into getting the PPO (proximal policy optimization) reinforcement learning algorithm working well.
There’s a Github repo that goes along with this that includes 1-file implementations of a lot of these algorithms. If you click through the files here you can navigate through the details of a variety of different RL algorithms. If you go to that repository’s main page it has a bunch of info for how to use this code — the implementation files are interesting to look at themselves tho).
Anyway, implementation details! Often frustrating if you need to figure them out to get something working, but kinda interesting when someone else has done the work of cleanly documenting them!
I don’t know anything about this, but, yeah, it’s great to get all the steps that are needed to get these algorithms to run.
I am not familiar with the details of implementing PPO, but in general if some detail is relevant, shouldn’t it be part of the algorithm?
Usually I find it particularly troublesome if anything, including unit tests, is very sensitive to seeding. This usually means that the algorithm is not very robust, or the problem itself is very difficult so results are not credible (eg optimizing with an objective with many local maxima starting from random points, same applies to MCMC).
That said, documenting these things is great. I hate it when I come across fudge factors or undocumented adjustments in sources for an algorithm that I am trying to implement, with no explanations, or cryptic ones.
Tamas:
For two recent high-profile examples of results breathlessly reported without details of tuning, see the post by the Google engineer discussed here and the paper by the Google engineers discussed here.
> shouldn’t it be part of the algorithm?
I guess ideally, but algorithms also can change a lot depending on where they’re implemented and still carry the same name. Like an eigensolve has a lot of variations depending on where you want your eigenvalues. In this case, the PPOs vary in different ways depending on what game you’re trying to play.
These things get so complicated these days though it’s hard to keep up.
We’ve done a reading group thing at work a few times where we randomly pick papers from conferences and try to give 1-slide presentations on them. I got this one: https://openaccess.thecvf.com/content/CVPR2022/papers/Di_GPV-Pose_Category-Level_Object_Pose_Estimation_via_Geometry-Guided_Point-Wise_Voting_CVPR_2022_paper.pdf
The complexity is through the roof! Had to dig into a couple background papers to even get the vague sense of what is going on. Not that that’s bad. I wouldn’t get far randomly picking a paper in a Math journal either.
> Usually I find it particularly troublesome if anything, including unit tests, is very sensitive to seeding
Oh yeah I like to just avoid seeds. If the thing is random, the thing is random and it’s nice to test it that way. Seeds are useful for triaging specific bugs and whatnot but the test is more valuable with more randoms.
Ofc. that sorta thinking might fail with a bigger codebase, and random failures in automated testing are annoying when the answer is “just try again”