Ben Bales writes:
I saw a presentation from Costa Huang (a graduate researcher working in reinforcement learning) who had some material which I thought would be interesting for the blog. (Full disclosure: Costa and I work at the same company.)
Apparently it can be a bit tricky to reproduce results in reinforcement learning even when code is shared because there’s lots and lots of extra details embedded in the code that aren’t always totally explained in words.
It all just sounded similar to MCMC-land by analogy — a billion tricks floating around, some things work in some places, some things make implementation simpler, some things make models more general, some defaults just seem to work weirdly well — hard to keep track of it all!
Here is a blog they worked on to document and add citations for all these tips-n-tricks/implementation-details that go into getting the PPO (proximal policy optimization) reinforcement learning algorithm working well.
There’s a Github repo that goes along with this that includes 1-file implementations of a lot of these algorithms. If you click through the files here you can navigate through the details of a variety of different RL algorithms. If you go to that repository’s main page it has a bunch of info for how to use this code — the implementation files are interesting to look at themselves tho).
Anyway, implementation details! Often frustrating if you need to figure them out to get something working, but kinda interesting when someone else has done the work of cleanly documenting them!
I don’t know anything about this, but, yeah, it’s great to get all the steps that are needed to get these algorithms to run.