Mathematical Foundations of Reinforcement Learning

404 points by ibobev a day ago

eachro 20 hours ago

During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?

al_th 17 hours ago

This is entirely doable.
I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.
I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.
I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment
- 363849473754 14 hours ago
  
  GRPO project is neat. Would you be willing to do a Karpathy-style explainer, breaking down the algorithm from scratch? It’s hard to understand on its own without prior background knowledge.
  - currymj 10 hours ago
    
    Find materials on PPO which should be widespread since it is the most popular RL algorithm. GRPO works on the same principles, just makes certain estimates from samples rather than training an auxiliary neural network to make them.

zqy123007 a day ago

6-lecture series on the Foundations of Deep RL by Pieter Abbeel is also very recommended. gives very good overview and intuition https://youtu.be/2GwBez0D20A

dualofdual a day ago

The best lectures on Reinforcement Learning and related topics are by Dimitris Bertsekas: https://web.mit.edu/dimitrib/www/home.html

rybthrow2 a day ago

Also one by David Silver of Deepmind, AlphaGo fame are good too: https://www.youtube.com/watch?v=2pWv7GOvuf0
esafak a day ago

His books tend to be dry and geared towards researchers, in my opinion. He has a new one on RL: https://web.mit.edu/dimitrib/www/RLCOURSECOMPLETE%202ndEDITI...
- joe_lin a day ago
  
  I'm looking for content (researcher myself) -- mainly on the application side. Should I start with this one? Or anything else?
  Very curious about RL for LLMs for example (using data from real use).
  - esafak a day ago
    
    I have not read it but it looks like a comprehensive reference. For a more applied treatment see Foundations of Deep Reinforcement Learning. https://slm-lab.gitbook.io/slm-lab/publications-and-talks/in...
    Neither cover LLMs. I don't follow the literature closely so I can only suggest you read papers: https://github.com/WindyLab/LLM-RL-Papers
richard___ a day ago

No. They are outdated and focused on strange things. You wont understand ppo from his textbooks
- cplat a day ago
  
  Which aspects? Foundational textbooks would focus on principles, not necessarily implementations, and don't go "outdated" the same way a snippet does.
forkerenok a day ago

Would you mind explicitly indicating whether you have reviewed the submitted materials? And if so, why is it inferior to the material you linked?
Not trying to catch you, genuine interest.

jgord a day ago

Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.

Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.

RiDiracTid 12 hours ago

RL is definitely really cool but I heavily doubt that we're gonna see 'hyper growth' from RL outside of the context of maybe training reasoning LLMs.
The period from ~2012-2019 of AI research had deepmind (who was the undisputed leader in money and talent) go all in on RL to solve problems and while they did do lots of interesting and useful work, there wasn't anything quite so extraordinary / revolutionary in massively accelerating the field or some sort of crazy breakthrough.
Their over-focus on RL instead of transformers/llms is what allowed OpenAI to surprise everyone and overtake deepmind.
Yes, RL is a useful tool, but outside the context of training LLMs for reasoning there isn't really any breakthrough that makes it more than an interesting tool for certain situations.
almostgotcaught 21 hours ago

> Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine
I love when people on hn make market predictions based on how revolutionary they think something is. I guess startup people thank they're also VC people.
FYI Sutton's book came out in 1999; none of this is revolutionary anymore and yet I don't see any "hyper growth". The reason is exactly because while you can train these models to play super Mario, you cannot use them to solve real world problems.
https://www.google.com/books/edition/Reinforcement_Learning/...
- jgord 19 hours ago
  Sure.. and neural networks came out a very long time ago, but are now arguably approaching usefulness in LLMs.
  Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.
  re : "RL cannot be used to solve real world problems" .. well, I would argue that these are useful real-world problems :
  - predict protein folding structure from DNA sequence - stabilizing high temperature fusion plasma - improving weather forecasting efficiency - improve DeepSeek's recent LLM model
  Im currently using RL techniques to find 3D geometry - pipes, beams, walls - in pointclouds. It is of practical benefit, as a lot of this is done manually, ballpark $5Bn/yr
  But I concede I cannot point to a plethora of small startups using RL for these real-world problems .. yet.
  This is a prediction, and I could be wrong in many ways - not least that LLMs digest RLs in full and learn to express their logical reasoning, approaching AGI, and use RLs internally, and so subsume and automate the use of RL.
  Are VCs better at predicting the future.. I guess that is their job, and they have money on the line... but I think even they would admit they need a large portfolio to capture the unicorns.
  VCs probably get a less detailed tech view than founders, but the large number of pitches they review should give them a noisy but wider overview of the whole bleeding edge of innovation.
  I think startup founders are in the same future prediction business .. and arguably have more skin in the game.
  Predictions would be pretty useless if they weren't somewhat controversial - a prediction we all agree on doesn't say much. Come back and chastize me if we dont see more RL startups in 12 months time !
  - almostgotcaught 18 hours ago
    
    > Come back and chastize me if we dont see more RL startups in 12 months time !
    1999 is 26 years ago but ya sure this is the year they finally take off.
    > Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.
    Or perhaps it could be that you're wrong and they're useless? Nah that couldn't be it.
    
    auggierose 13 hours ago
    
    I think you GOT caught here. That's why you don't respond to the Nobel prize winning example of RL.
    
    bglazer 13 hours ago
    
    Are we talking about AlphaFold? It did not use RL, right?
    
    auggierose 12 hours ago
    
    I think it does: https://juanraul8.github.io/master-praktikum/
- currymj 5 hours ago
  
  generally you are right in spirit.
  however multi-armed bandit algorithms are highly useful in practice. these are a special case of RL (RL with one state, essentially).
  there are even some extensions of applied bandit algorithms to "true RL", e.g. for recommender systems that want to consider history.
  this is the place to look for real-world applications of RL.
  also RL uses importance-sampling estimators of the gradient. these sometimes show up in other applications though not framed as "RL".
- bitvoid 12 hours ago
  
  > you cannot use them to solve real world problems
  Doesn't waymo and other self-driving systems use reinforcement learning? I thought it was used in robotics as well (i.e., bipedal, quadrupedal movement).
- smokel 20 hours ago
  
  Reinforcement learning is hard to apply to real-world problems, but one cannot deny the success that a company such as OpenAI has.
- CamperBob2 10 hours ago
  
  "FYI Maxwell's paper came out in 1865 and now it's 1896 and Marconi's radio, which he invented a whole year ago, still doesn't pick up anything but buzzes and static. The reason is exactly because while you can manipulate the electromagnetic field with current fluctuations, you cannot use it to solve real world problems."

lemonlym a day ago

Another great resource on RL is Mykel Kochenderfer's suite of textbooks: https://algorithmsbook.com/

noobly a day ago

These books are all RL? I’ve got the decision one, I didn’t think the other had anything to do with RL.
- jvanderbot a day ago
  
  He (author) has a strong proclivity for policy-based planning, shall we say.

kristjansson a day ago

Also worth mentioning Murphy's WIP textbook[0] focused entirely on RL, which is an outgrowth of his excellent ML textbooks.

[0]: https://arxiv.org/abs/2412.05265

ivanbelenky a day ago

Awesome resource, in case someone is interested I implemented most of suttons book here https://github.com/ivanbelenky/RL

Iwan-Zotow a day ago

Thanks, looks good

hazrmard 8 hours ago

Thank you. This is great. I also appreciated the linked code for MinRL (https://github.com/10-OASIS-01/minrl).

Having done research in RL, a big problem with incremental research was to reproduce comparative works, and to validate my own contributions. A simple library like this, with built-in tools for visualization and a gridworld sandbox where I can validate just by observation, is very helpful!

Culonavirus 17 hours ago

> This book, however, requires the reader to have some knowledge of probability theory and linear algebra.

This is so funny to me, I see it often and I'm always like "yea, right, some knowledge"... these statements always need to be taken with a grain of salt and an understanding that math nerds wrote them. Average programmers with average math skills (like me) beware ;)

sigmoid10 17 hours ago

This usually means that average CS or EE university level students should be able to easily follow it even if they have never touched the topic. It's far below the level of math and physics degrees, but still somewhat above what you could expect from an average self taught programmer.
- Culonavirus 4 hours ago
  
  I'm not even self-taught, it's just that when I was studying (CS degree, like 15 years ago) we did have a mandatory linear algebra course, graph theory course, statistics course etc., but we never * actually * used any of that in practice, it was all algo this, big o that, data structures, design patterns, languages, compilers, SQL etc. Now that I'm thinking about it pretty much the only course we had to use some linear algebra in was the 3d rendering one. ...
  And then you work on .net/java/sql/server crap for a decade and you forget even the little math you used to know :D

monadicmonad a day ago

I don't know how to go from understanding this material to having a job in the field. Just stuck as a SWE for now.

godelski a day ago
```
  - Do you understand the material?
  - Can you utilize your understanding to build successful models/algorithms? 
```
If the answer is yes to both, do some projects, put them on your github, and update your resume. You might need to take a job at a lower position first, but you can jump from there. But I want to make sure that the answer is "yes" to both and note that it is easy to think you understand something without actually understanding it. Importantly we must recognize that everyone has a different level of sufficient knowledge where they are comfortable saying that they "understand" a topic. One person might say they don't and be more knowledgeable than someone that says they do. But demonstration of the knowledge levels is at least a decent proxy for determining this.
A way I like to gauge someone's understandings of things is by getting them to explain the limitations. This is often less explicitly stated in learning and a deeper understanding is acquired through experience and most importantly, reflection on that experience. This is often an underutilized tactic but it is very effective. If you can't do this, then the good news is that starting now will only accelerate your understanding :)
- varelaseb a day ago
  
  Just a random thought:
  Understanding the limitations is a complicated thing in tech. You can finnangle most systems into doing mostly anything, as inefficient as that may prove to be.
  The question then becomes up to what point is it "a reasonably better than most others" solution. And that's a question of an understanding of a field, not a space in the field.
  - godelski a day ago
    
    > is a complicated thing in tech
    That's the point. Understanding complex things is what experts are supposed to do.
    > You can finnangle most systems into doing mostly anything
    "most" is doing a lot of heavy lifting here and I think the point you're making isn't discrediting my point. Sure you can hamfist a lot of things into working but an expert should know when to use better tools. Being able to identify what would end up as a very hacky solution from one paradigm but could be efficient and/or elegant in another is what an expert should be able to identify. Essentially, are they able to reduce technical debt even before that debt is taken on?
    > an understanding of a field, not a space in the field.
    Would you mind clarifying the difference? I agree these are different things but I'm not sure why understanding the limitations would imply not having narrower domain knowledge. Sure, in ML knowing the advantages of convolutions over transformers and vise versa is good. But if you're working on LLMs, ViTs, or anything else it is still good to know what the limitations of transformer models are, and specifically what attention can and cannot do. We should be able to get more and more narrow too. An expert will be able to understand the nuances of specific evaluation methods: metrics, measures, datasets, and other forms of analysis. Being able to discuss nuance and detail is how you determine if someone has expertise or not. IME it tends to be pretty easy to identify experts (even in other fields) due to their ability and frequency of discussing nuances.
CamperBob2 10 hours ago

Step 1: Build something cool with it.

shidoshi 11 hours ago

Amazing resource. Highly recommended for both content and approachability.

CaffeineLD50 a day ago

And if you want to understand the theory of Skinner's Verbal Behavior check out

https://bfskinner.org/wp-content/uploads/2020/11/978_0_99645...

alkafdslk a day ago

[dead]