eachro 20 hours ago

During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?

  • al_th 17 hours ago

    This is entirely doable.

    I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.

    I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.

    I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment

    • 363849473754 14 hours ago

      GRPO project is neat. Would you be willing to do a Karpathy-style explainer, breaking down the algorithm from scratch? It’s hard to understand on its own without prior background knowledge.

      • currymj 10 hours ago

        Find materials on PPO which should be widespread since it is the most popular RL algorithm. GRPO works on the same principles, just makes certain estimates from samples rather than training an auxiliary neural network to make them.

dualofdual a day ago

The best lectures on Reinforcement Learning and related topics are by Dimitris Bertsekas: https://web.mit.edu/dimitrib/www/home.html

  • esafak a day ago

    His books tend to be dry and geared towards researchers, in my opinion. He has a new one on RL: https://web.mit.edu/dimitrib/www/RLCOURSECOMPLETE%202ndEDITI...

  • richard___ a day ago

    No. They are outdated and focused on strange things. You wont understand ppo from his textbooks

    • cplat a day ago

      Which aspects? Foundational textbooks would focus on principles, not necessarily implementations, and don't go "outdated" the same way a snippet does.

  • forkerenok a day ago

    Would you mind explicitly indicating whether you have reviewed the submitted materials? And if so, why is it inferior to the material you linked?

    Not trying to catch you, genuine interest.

jgord a day ago

Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.

Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.

  • RiDiracTid 12 hours ago

    RL is definitely really cool but I heavily doubt that we're gonna see 'hyper growth' from RL outside of the context of maybe training reasoning LLMs.

    The period from ~2012-2019 of AI research had deepmind (who was the undisputed leader in money and talent) go all in on RL to solve problems and while they did do lots of interesting and useful work, there wasn't anything quite so extraordinary / revolutionary in massively accelerating the field or some sort of crazy breakthrough.

    Their over-focus on RL instead of transformers/llms is what allowed OpenAI to surprise everyone and overtake deepmind.

    Yes, RL is a useful tool, but outside the context of training LLMs for reasoning there isn't really any breakthrough that makes it more than an interesting tool for certain situations.

  • almostgotcaught 21 hours ago

    > Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

    I love when people on hn make market predictions based on how revolutionary they think something is. I guess startup people thank they're also VC people.

    FYI Sutton's book came out in 1999; none of this is revolutionary anymore and yet I don't see any "hyper growth". The reason is exactly because while you can train these models to play super Mario, you cannot use them to solve real world problems.

    https://www.google.com/books/edition/Reinforcement_Learning/...

    • jgord 19 hours ago

      Sure.. and neural networks came out a very long time ago, but are now arguably approaching usefulness in LLMs.

      Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

      re : "RL cannot be used to solve real world problems" .. well, I would argue that these are useful real-world problems :

        - predict protein folding structure from DNA sequence
        - stabilizing high temperature fusion plasma
        - improving weather forecasting efficiency
        - improve DeepSeek's recent LLM model
      
      
      Im currently using RL techniques to find 3D geometry - pipes, beams, walls - in pointclouds. It is of practical benefit, as a lot of this is done manually, ballpark $5Bn/yr

      But I concede I cannot point to a plethora of small startups using RL for these real-world problems .. yet.

      This is a prediction, and I could be wrong in many ways - not least that LLMs digest RLs in full and learn to express their logical reasoning, approaching AGI, and use RLs internally, and so subsume and automate the use of RL.

      Are VCs better at predicting the future.. I guess that is their job, and they have money on the line... but I think even they would admit they need a large portfolio to capture the unicorns.

      VCs probably get a less detailed tech view than founders, but the large number of pitches they review should give them a noisy but wider overview of the whole bleeding edge of innovation.

      I think startup founders are in the same future prediction business .. and arguably have more skin in the game.

      Predictions would be pretty useless if they weren't somewhat controversial - a prediction we all agree on doesn't say much. Come back and chastize me if we dont see more RL startups in 12 months time !

      • almostgotcaught 18 hours ago

        > Come back and chastize me if we dont see more RL startups in 12 months time !

        1999 is 26 years ago but ya sure this is the year they finally take off.

        > Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

        Or perhaps it could be that you're wrong and they're useless? Nah that couldn't be it.

    • currymj 5 hours ago

      generally you are right in spirit.

      however multi-armed bandit algorithms are highly useful in practice. these are a special case of RL (RL with one state, essentially).

      there are even some extensions of applied bandit algorithms to "true RL", e.g. for recommender systems that want to consider history.

      this is the place to look for real-world applications of RL.

      also RL uses importance-sampling estimators of the gradient. these sometimes show up in other applications though not framed as "RL".

    • bitvoid 12 hours ago

      > you cannot use them to solve real world problems

      Doesn't waymo and other self-driving systems use reinforcement learning? I thought it was used in robotics as well (i.e., bipedal, quadrupedal movement).

    • smokel 20 hours ago

      Reinforcement learning is hard to apply to real-world problems, but one cannot deny the success that a company such as OpenAI has.

    • CamperBob2 10 hours ago

      "FYI Maxwell's paper came out in 1865 and now it's 1896 and Marconi's radio, which he invented a whole year ago, still doesn't pick up anything but buzzes and static. The reason is exactly because while you can manipulate the electromagnetic field with current fluctuations, you cannot use it to solve real world problems."

lemonlym a day ago

Another great resource on RL is Mykel Kochenderfer's suite of textbooks: https://algorithmsbook.com/

  • noobly a day ago

    These books are all RL? I’ve got the decision one, I didn’t think the other had anything to do with RL.

    • jvanderbot a day ago

      He (author) has a strong proclivity for policy-based planning, shall we say.

hazrmard 8 hours ago

Thank you. This is great. I also appreciated the linked code for MinRL (https://github.com/10-OASIS-01/minrl).

Having done research in RL, a big problem with incremental research was to reproduce comparative works, and to validate my own contributions. A simple library like this, with built-in tools for visualization and a gridworld sandbox where I can validate just by observation, is very helpful!

Culonavirus 17 hours ago

> This book, however, requires the reader to have some knowledge of probability theory and linear algebra.

This is so funny to me, I see it often and I'm always like "yea, right, some knowledge"... these statements always need to be taken with a grain of salt and an understanding that math nerds wrote them. Average programmers with average math skills (like me) beware ;)

  • sigmoid10 17 hours ago

    This usually means that average CS or EE university level students should be able to easily follow it even if they have never touched the topic. It's far below the level of math and physics degrees, but still somewhat above what you could expect from an average self taught programmer.

    • Culonavirus 4 hours ago

      I'm not even self-taught, it's just that when I was studying (CS degree, like 15 years ago) we did have a mandatory linear algebra course, graph theory course, statistics course etc., but we never * actually * used any of that in practice, it was all algo this, big o that, data structures, design patterns, languages, compilers, SQL etc. Now that I'm thinking about it pretty much the only course we had to use some linear algebra in was the 3d rendering one. ...

      And then you work on .net/java/sql/server crap for a decade and you forget even the little math you used to know :D

monadicmonad a day ago

I don't know how to go from understanding this material to having a job in the field. Just stuck as a SWE for now.

  • godelski a day ago

      - Do you understand the material?
      - Can you utilize your understanding to build successful models/algorithms? 
    
    If the answer is yes to both, do some projects, put them on your github, and update your resume. You might need to take a job at a lower position first, but you can jump from there. But I want to make sure that the answer is "yes" to both and note that it is easy to think you understand something without actually understanding it. Importantly we must recognize that everyone has a different level of sufficient knowledge where they are comfortable saying that they "understand" a topic. One person might say they don't and be more knowledgeable than someone that says they do. But demonstration of the knowledge levels is at least a decent proxy for determining this.

    A way I like to gauge someone's understandings of things is by getting them to explain the limitations. This is often less explicitly stated in learning and a deeper understanding is acquired through experience and most importantly, reflection on that experience. This is often an underutilized tactic but it is very effective. If you can't do this, then the good news is that starting now will only accelerate your understanding :)

    • varelaseb a day ago

      Just a random thought:

      Understanding the limitations is a complicated thing in tech. You can finnangle most systems into doing mostly anything, as inefficient as that may prove to be.

      The question then becomes up to what point is it "a reasonably better than most others" solution. And that's a question of an understanding of a field, not a space in the field.

      • godelski a day ago

          > is a complicated thing in tech
        
        That's the point. Understanding complex things is what experts are supposed to do.

          > You can finnangle most systems into doing mostly anything
        
        "most" is doing a lot of heavy lifting here and I think the point you're making isn't discrediting my point. Sure you can hamfist a lot of things into working but an expert should know when to use better tools. Being able to identify what would end up as a very hacky solution from one paradigm but could be efficient and/or elegant in another is what an expert should be able to identify. Essentially, are they able to reduce technical debt even before that debt is taken on?

          > an understanding of a field, not a space in the field.
        
        Would you mind clarifying the difference? I agree these are different things but I'm not sure why understanding the limitations would imply not having narrower domain knowledge. Sure, in ML knowing the advantages of convolutions over transformers and vise versa is good. But if you're working on LLMs, ViTs, or anything else it is still good to know what the limitations of transformer models are, and specifically what attention can and cannot do. We should be able to get more and more narrow too. An expert will be able to understand the nuances of specific evaluation methods: metrics, measures, datasets, and other forms of analysis. Being able to discuss nuance and detail is how you determine if someone has expertise or not. IME it tends to be pretty easy to identify experts (even in other fields) due to their ability and frequency of discussing nuances.
  • CamperBob2 10 hours ago

    Step 1: Build something cool with it.

shidoshi 11 hours ago

Amazing resource. Highly recommended for both content and approachability.