<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[One Minute NLP]]></title><description><![CDATA[Every week, I break down one key NLP/Generative AI concept into a single, easy-to-digest slide so you can stay on top of the field in just one minute a week.]]></description><link>https://oneminutenlp.com</link><image><url>https://substackcdn.com/image/fetch/$s_!ocF2!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7cd7dd-0749-4103-8912-73e959da24ac_880x880.png</url><title>One Minute NLP</title><link>https://oneminutenlp.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 16 Apr 2026 20:32:01 GMT</lastBuildDate><atom:link href="https://oneminutenlp.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Dasha Herrmannova]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[oneminutenlp@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[oneminutenlp@substack.com]]></itunes:email><itunes:name><![CDATA[Dasha Herrmannova]]></itunes:name></itunes:owner><itunes:author><![CDATA[Dasha Herrmannova]]></itunes:author><googleplay:owner><![CDATA[oneminutenlp@substack.com]]></googleplay:owner><googleplay:email><![CDATA[oneminutenlp@substack.com]]></googleplay:email><googleplay:author><![CDATA[Dasha Herrmannova]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Model Context Protocol]]></title><description><![CDATA[MCP is an open standard that defines how LLMs integrate with external tools, data, and context.]]></description><link>https://oneminutenlp.com/p/model-context-protocol</link><guid isPermaLink="false">https://oneminutenlp.com/p/model-context-protocol</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Mon, 25 Aug 2025 19:31:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gcWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hi everyone &#128075;</p><p>Before today&#8217;s topic, a quick note&#8212;I stepped away from the newsletter in April after a family loss since I needed time to slow down and just be. I&#8217;m back and excited to restart <strong>One Minute NLP</strong>: one slide, one concept, once a week to help you keep up with NLP/GenAI.</p><p>I really appreciate you sticking around. If you&#8217;d rather unsubscribe, you can do that anytime at the bottom of this email (or click <a href="https://substack.com/settings">here</a> if you&#8217;re reading the web version).</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gcWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gcWL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gcWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76706,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160386660?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gcWL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!gcWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c4daa3-40a5-4ed0-a817-9edeea0cc3e6_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Model Context Protocol</h1><p><strong>Model Context Protocol</strong> (MCP) is an open standard for giving language models structured access to tools, data, and prewritten prompts. Instead of relying on ad-hoc prompt engineering or custom retrieval pipelines, tools and data sources are exposed to models through <strong>MCP servers</strong> in a standardized format. Developers can plug in existing tools (both cloud e.g., GitHub, Slack and local e.g., company database) rather than reinvent integrations each time. This approach promotes reusability across models and projects. An <strong>MCP client</strong> (usually inside an AI app or agent&#8212;an <strong>MCP Host</strong>) is used to query these servers to retrieve information or invoke actions.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M8p-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M8p-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 424w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 848w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 1272w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M8p-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png" width="828" height="213" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:213,&quot;width&quot;:828,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160386660?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M8p-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 424w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 848w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 1272w, https://substackcdn.com/image/fetch/$s_!M8p-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef24bc7-939c-4e57-847b-a6d237691d87_828x213.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Relationship between MCP Host, Clients, and Servers.</figcaption></figure></div><p>For MCP to work, the model (or its agent wrapper) must support <strong>tool use</strong>. MCP doesn&#8217;t overcome model limits like context window size or reasoning errors; it merely provides a standard for managing model context.</p><h2>Further reading</h2><ul><li><p><a href="https://modelcontextprotocol.io/quickstart/server">Build an MCP Server</a> and <a href="https://modelcontextprotocol.io/quickstart/client">Build an MCP Client</a> tutorials by ModelContextProtocol.io &#8212; The official MCP documentation is a great place to get started with MCP. The tutorials are easy to follow and can give you a solid grasp of how MCP Servers and Clients work.</p></li><li><p><a href="https://github.com/modelcontextprotocol/servers">Model Context Protocol servers</a> &#8212; This is a very comprehensive list of existing MCP servers.</p></li><li><p><a href="https://blog.sshh.io/p/everything-wrong-with-mcp">Everything Wrong with MCP</a> by Shrivu Shankar &#8212; This article does a great job explaining the limitations of MCP.</p></li></ul><div><hr></div><h3>Download complete One Minute NLP</h3><p>Do you want to use One Minute NLP slides as a quick reference? You can download the complete set of all past One Minute NLP topics here: <a href="https://complete.oneminutenlp.com">complete.oneminutenlp.com</a>.</p><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[In-context learning]]></title><description><![CDATA[In-context learning is a prompting technique that allows LLMs to learn and adapt to new tasks based on examples provided within the input prompt, without requiring additional training or fine-tuning.]]></description><link>https://oneminutenlp.com/p/in-context-learning</link><guid isPermaLink="false">https://oneminutenlp.com/p/in-context-learning</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Fri, 11 Apr 2025 03:02:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZwoY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZwoY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZwoY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZwoY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66070,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160386606?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZwoY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ZwoY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5c4fae-7854-45a6-a44d-e83ddedc128d_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>In-context learning</h1><p>In-context learning is a prompting technique where LLMs learn how to handle new tasks using examples in the prompt, without any weight updates. Unlike training or fine-tuning, which changes model parameters, in-context learning utilizes input-output pairs provided in the prompt to guide LLM behavior, even for tasks the LLM wasn&#8217;t explicitly trained on.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GjWs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GjWs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 424w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 848w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 1272w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GjWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png" width="870" height="166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:166,&quot;width&quot;:870,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67248,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160386606?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GjWs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 424w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 848w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 1272w, https://substackcdn.com/image/fetch/$s_!GjWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b4d2cb-421d-4ab4-a96a-7f01d1729e9a_870x166.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Simple few-shot prompt example</figcaption></figure></div><p>This allows rapid task adaptation without the compute or data needed for fine-tuning. Supplying examples is called <strong>few-shot learning</strong>; using none is <strong>zero-shot</strong>. Few-shot prompting often outperforms zero-shot on many tasks.</p><p>Performance depends heavily on the task, the number and order of examples, and how well they&#8217;re chosen. A few well-curated examples can match the performance of a fully fine-tuned model, while poorly selected ones can significantly degrade it.</p><h2>Further reading</h2><ul><li><p><a href="https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html">Language Models are Few-Shot Learners</a> by Brown et al. &#8212; This paper introduced GPT-3 and the concept of in-context learning.</p></li><li><p><a href="https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/">Prompt Engineering</a> by Lilian Weng &#8212; This fantastic article explains many prompt engineering techniques including few-shot prompting and provides tips for selecting good examples.</p></li><li><p><a href="https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part?open=false#%C2%A7few-shot-learning">Practical Prompt Engineering</a> by Cameron Wolfe &#8212; Drawing on lots of research articles, this deep dive covers many common prompting techniques including few-shot prompting.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Every week, I break down one key NLP or Generative AI concept into a single, easy-to-digest slide. No fluff&#8212;just the core idea, explained clearly, so you can stay sharp in just one minute a week.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Get your weekly NLP cheat sheet</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Reflection agents]]></title><description><![CDATA[Reflection is a simple but powerful technique for improving the quality of LLM responses by prompting the LLM to reflect on its own output to identify gaps or possibilities for improvement.]]></description><link>https://oneminutenlp.com/p/reflection-agents</link><guid isPermaLink="false">https://oneminutenlp.com/p/reflection-agents</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Sun, 30 Mar 2025 22:01:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mmUk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mmUk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mmUk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mmUk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71413,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160200053?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mmUk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!mmUk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2a6fa7-45f7-4da6-8f36-8624e97514b7_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Reflection agents</h1><p><strong>Reflection</strong> is a prompting strategy used in <strong>agent workflows</strong> to iteratively improve an LLM&#8217;s reasoning or outputs. Instead of directly returning a final answer, the LLM is first prompted to reflect on its initial response and identify errors, gaps, or opportunities for improvement. It then uses this feedback to generate a revised response. This can be repeated multiple times until some stopping condition is met (e.g., max iterations, token limit, or some evaluation criteria).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fgG6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fgG6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 424w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 848w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 1272w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fgG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png" width="869" height="184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:184,&quot;width&quot;:869,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75774,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/160200053?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fgG6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 424w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 848w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 1272w, https://substackcdn.com/image/fetch/$s_!fgG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df15043-d19f-470a-b456-19dc8c66ed9e_869x184.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Common modules used in reflection agents.</figcaption></figure></div><p>This process has been shown to improve performance over direct response generation on a variety of tasks. Some agents add an explicit <strong>evaluation step</strong> which is called prior to reflection. Evaluation can be done via an LLM judge, external tools (e.g., a code compiler or a knowledge base) or other methods (e.g., heuristic score). Reflection can be used in combination with other agent design patterns such as the <strong>ReAct</strong> framework.</p><h2>Further reading</h2><ul><li><p><a href="https://arxiv.org/abs/2303.11366">Reflexion: Language Agents with Verbal Reinforcement Learning</a> by Shinn et al. &#8212; A seminal paper on reflection which used separate evaluation and reflection modules. This paper is a great read if you want to understand reflection more in depth. The appendix includes the prompts the authors used in their experiments.</p></li><li><p>If you want to try running a reflection agent yourself, both <a href="https://blog.langchain.dev/reflection-agents/">LangChain</a> and <a href="https://web.archive.org/web/20250322154619/https://docs.llamaindex.ai/en/stable/examples/agent/introspective_agent_toxicity_reduction/">LlamaIndex</a> include different variations of reflection agents. </p></li><li><p><a href="https://huyenchip.com/2025/01/07/agents.html#agent_overview">Agents</a> by Chip Huyen &#8212; this fantastic blog post covers much more than just Reflection, but if you are looking to understand agents and where Reflection fits in, this is a great place to start.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Top-k and top-p sampling]]></title><description><![CDATA[Top-k and top-p sampling are methods used to control the randomness and diversity of LLM outputs.]]></description><link>https://oneminutenlp.com/p/top-k-and-top-p-sampling</link><guid isPermaLink="false">https://oneminutenlp.com/p/top-k-and-top-p-sampling</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Thu, 20 Mar 2025 04:26:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2qCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2qCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2qCL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2qCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de730971-4299-4f7d-b626-044462c934ac_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69101,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/159455454?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2qCL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!2qCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde730971-4299-4f7d-b626-044462c934ac_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Top-k and top-p sampling</h1><p><strong>Sampling</strong> is a method LLMs use to generate output tokens, with the next token selected randomly based on learned <strong>output token probabilities</strong>. Higher-probability tokens are more likely to be chosen, but random sampling can generate odd or nonsensical outputs if low-probability tokens are picked.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kKeH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kKeH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 424w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 848w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 1272w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kKeH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png" width="832" height="184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:184,&quot;width&quot;:832,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48824,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/159455454?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kKeH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 424w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 848w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 1272w, https://substackcdn.com/image/fetch/$s_!kKeH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60655d0e-012d-4a56-9e0a-dae16eb54a5b_832x184.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Comparison of top-k and top-p sampling.</figcaption></figure></div><p><strong>Top-k sampling</strong> truncates the distribution to the <em>k</em> most probable tokens, then randomly picks from this subset using renormalized probabilities. Top-1 sampling is equivalent to <strong>greedy decoding</strong> which always picks the most probable token. Top-k sampling can struggle if the top <em>k</em> tokens include low-probability tokens or omit likely tokens. <strong>Top-p sampling</strong> (<strong>nucleus sampling</strong>) addresses this by keeping the top <em>p</em> percent of the probability mass instead.</p><p>Top-k and top-p sampling can be used with or as alternatives to <strong>temperature sampling</strong>.</p><h2>Further reading</h2><ul><li><p><a href="https://web.stanford.edu/~jurafsky/slp3/">Speech and Language Processing</a> by Jurafsky and Martin (free to read online) &#8212; Section 10.8 (Large Language Models: Generation by Sampling) provides a great introduction to sampling from language models. </p></li><li><p><a href="https://huggingface.co/blog/how-to-generate">How to generate text: using different decoding methods for language generation with Transformers</a> &#8212; If you prefer to learn with code, this Hugging Face blog post includes code examples alongside explanations of the most popular sampling techniques.</p></li><li><p><a href="https://arxiv.org/abs/1904.09751">The Curious Case of Neural Text Degeneration</a> by Holtzman et al. &#8212; This paper which introduced top-p sampling includes comparisons of different sampling techniques and their pros and cons.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Reasoning Models]]></title><description><![CDATA[Reasoning models are a new class of LLMs designed to solve complex problems like math and coding.]]></description><link>https://oneminutenlp.com/p/reasoning-models</link><guid isPermaLink="false">https://oneminutenlp.com/p/reasoning-models</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Wed, 12 Mar 2025 01:05:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!inKQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!inKQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!inKQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!inKQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72591,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/158210851?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!inKQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!inKQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be0a2b6-8475-49ae-a809-66b6a1315576_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Reasoning Models</h1><p><strong>Reasoning models</strong> are a new class of LLMs designed to solve complex problems like math and coding. Unlike standard LLMs that generate an answer directly, reasoning models are designed and trained to first produce <strong>intermediate &#8220;thinking&#8221; steps</strong> (similar to <strong>chain-of-thought reasoning</strong>, but longer and more detailed) before finalizing a response. This makes them strong at multi-step logic tasks (e.g., math proofs, coding challenges), but less efficient for simpler tasks like translation. Reasoning models are trained using <strong>reinforcement learning</strong> (RL). For example, <strong>DeepSeek-R1</strong> was trained using a combination of RL from human preference (RLHF) and RL using verifiable rewards (e.g., math correctness).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3hRh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3hRh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 424w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 848w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 1272w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3hRh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png" width="875" height="221" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:221,&quot;width&quot;:875,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/158210851?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3hRh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 424w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 848w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 1272w, https://substackcdn.com/image/fetch/$s_!3hRh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3376d4e6-f807-4541-be4c-cbccb8e32d24_875x221.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Simplified representation of DeepSeek-R1 training.</figcaption></figure></div><p>Reasoning models are typically slower and more expensive compared to standard LLMs because they require more output tokens for thinking steps.</p><h2>Further reading</h2><ul><li><p><a href="https://cameronrwolfe.substack.com/p/demystifying-reasoning-models">Demystifying Reasoning Models</a> by Cameron Wolfe &#8212; Easy to follow article on reasoning models that explains what they are and how they are trained. Includes lots of helpful references.</p></li><li><p><a href="https://arxiv.org/abs/2502.21321">LLM Post-Training: A Deep Dive into Reasoning Large Language Models</a> by Kumar et al. &#8212; If you want to dive deeper, this paper gives a thorough overview post-training techniques, including the RL techniques that underpin models like DeepSeek-R1.</p></li><li><p><a href="https://platform.openai.com/docs/guides/reasoning-best-practices">Reasoning best practices</a> (OpenAI Platform documentation) &#8212; This page gives examples of problems where OpenAI&#8217;s reasoning models have been found to work well and includes tips for prompting reasoning models effectively.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Group Relative Policy Optimization]]></title><description><![CDATA[Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm that was used to train DeepSeek-R1.]]></description><link>https://oneminutenlp.com/p/group-relative-policy-optimization</link><guid isPermaLink="false">https://oneminutenlp.com/p/group-relative-policy-optimization</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Mon, 03 Mar 2025 05:19:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0UmV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0UmV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0UmV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0UmV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17056295-091c-4eae-853f-f9102b2bf356_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85424,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/156281372?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0UmV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!0UmV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17056295-091c-4eae-853f-f9102b2bf356_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Group Relative Policy Optimization</h1><p><strong>Group Relative Policy Optimization</strong> (GRPO) is a new reinforcement learning algorithm that can be used in Reinforcement Learning (RL) for LLMs in place of <strong>Proximal Policy Optimization</strong> (PPO). In GRPO, multiple responses are generated from the same prompt using the LLM (also called <strong>policy</strong>) being trained. GRPO uses their average <strong>reward</strong> (their &#8216;quality&#8217;) as a baseline for computing each answer&#8217;s <strong>advantage</strong> relative to the average reward of the group; the advantage is used to calculate gradients for the policy LLM. This eliminates the need for a separate critic model (like in PPO) to calculate advantage, thereby using much less memory and compute. Like PPO, GRPO constrains the size of weight updates to prevent drastic behavior changes.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YBb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YBb7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 424w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 848w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 1272w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YBb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png" width="876" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/156281372?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YBb7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 424w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 848w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 1272w, https://substackcdn.com/image/fetch/$s_!YBb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00811f21-d759-4cba-ad60-322f89c9bfaf_876x201.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Steps involved in GRPO.</figcaption></figure></div><p>GRPO was used to train DeepSeek R1 in math and coding-based RL (using rule-based rewards) and RL from Human Feedback (RLHF). GRPO, like PPO, still depends heavily on the quality of the reward signal.</p><h2>Further reading</h2><ul><li><p><a href="https://arxiv.org/abs/2402.03300">DeepSeekMath: Pushing the Limits of Mathematical</a></p><p><a href="https://arxiv.org/abs/2402.03300">Reasoning in Open Language Models</a> by Shao et al. &#8212; This paper introduced GRPO. The power of the algorithm was later demonstrated in <a href="https://arxiv.org/abs/2501.12948">DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning</a> by Guo et al. which introduced the DeepSeek-R1 model.</p></li><li><p><a href="https://yugeten.github.io/posts/2025/01/ppogrpo/">A vision researcher&#8217;s guide to some RL stuff: PPO &amp; GRPO</a> by Yuge Shi &#8212; This blog post explains PPO and GRPO, and covers differences between the algorithms and how they are used in RLHF.</p></li><li><p>The <a href="https://huggingface.co/docs/trl/main/en/index">Transformer Reinforcement Learning</a> (TRL) library by Hugging Face includes a <a href="https://huggingface.co/docs/trl/main/en/grpo_trainer">GRPO Trainer</a>. The documentation includes an example showing how to use GRPO to train a model.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Proximal Policy Optimization]]></title><description><![CDATA[Proximal Policy Optimization is frequently used in Reinforcement Learning from Human Feedback to further train LLMs after supervised fine-tuning. It was used to train InstructGPT and ChatGPT.]]></description><link>https://oneminutenlp.com/p/proximal-policy-optimization</link><guid isPermaLink="false">https://oneminutenlp.com/p/proximal-policy-optimization</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Tue, 25 Feb 2025 03:39:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/75cd1962-cf68-4cc9-bef9-fc94da2f19b5_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ATXP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ATXP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ATXP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75136,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/156281287?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ATXP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ATXP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0182c7fe-45f8-4d2d-83d5-b8befdb8ecd7_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Proximal Policy Optimization</h1><p><strong>Proximal Policy Optimization</strong> (PPO) is a reinforcement learning (RL) algorithm that is frequently used in <strong>Reinforcement Learning from Human Feedback</strong> (RLHF) to further train LLMs after supervised fine-tuning. In RLHF, an LLM output receives a numerical <strong>reward</strong> that represents human preference; the goal is to maximize this reward. PPO uses an <strong>actor-critic setup</strong>, training two models simultaneously. The actor (also called <strong>policy</strong>) is the LLM that generates tokens; the critic estimates the expected final reward at each token&#8212;these estimates are used to calculate gradients for the policy LLM. PPO constrains the size of weight updates on the policy LLM at each training step to prevent drastic and unpredictable behavior changes.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Dg0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Dg0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 424w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 848w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 1272w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Dg0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png" width="873" height="217" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:217,&quot;width&quot;:873,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60611,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://oneminutenlp.substack.com/i/156281287?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Dg0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 424w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 848w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 1272w, https://substackcdn.com/image/fetch/$s_!0Dg0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90cec589-4bc9-4770-a7ce-db7d33c92c1e_873x217.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Simplified workflow depicting models involved in PPO (actor, critic, and reward model) and how they interact.</figcaption></figure></div><p>PPO is the standard algorithm for RLHF; it was used to fine-tune InstructGPT and ChatGPT. PPO doesn&#8217;t prevent <strong>reward hacking</strong>&#8212;achieving high rewards through learning to exploit imperfections in the reward model.</p><h2>Further reading</h2><ul><li><p>These series of three posts on RL for LLMs by Cameron Wolfe do a fantastic job of explaining both the intuition and the math behind RL algorithms in an accessible way: <a href="https://cameronrwolfe.substack.com/p/basics-of-reinforcement-learning">Basics of Reinforcement Learning for LLMs</a>, <a href="https://cameronrwolfe.substack.com/p/policy-gradients-the-foundation-of">Policy Gradients: The Foundation of RLHF</a>, <a href="https://cameronrwolfe.substack.com/p/proximal-policy-optimization-ppo">Proximal Policy Optimization (PPO): The Key to LLM Alignment</a>.</p></li><li><p><a href="https://iclr-blogposts.github.io/2024/blog/the-n-implementation-details-of-rlhf-with-ppo/">The N Implementation Details of RLHF with PPO</a> by Costa Huang et al. &#8212; This ICLR blog post walks through a Python implementation of RLHF with PPO.</p></li><li><p><a href="https://huggingface.co/docs/trl/en/detoxifying_a_lm">Detoxifying a Language Model using PPO</a> by Hugging Face shows how to use the TRL (Transformer Reinforcement Learning) library to apply PPO to reduce toxicity of an LLM.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[RLHF]]></title><description><![CDATA[Reinforcement Learning from Human Feedback is a training phase used on LLMs after supervised fine-tuning to further improve LLM responses. It was one of the key innovations behind ChatGPT.]]></description><link>https://oneminutenlp.com/p/rlhf</link><guid isPermaLink="false">https://oneminutenlp.com/p/rlhf</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Sat, 15 Feb 2025 23:43:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ii2v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ii2v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ii2v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ii2v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ii2v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ii2v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b0f464c-c319-4854-89bc-57a4d3f51b9c_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>RLHF</h1><p><strong>Reinforcement Learning from Human Feedback</strong> (RLHF) is a training phase used on LLMs after supervised fine-tuning to further improve LLM responses. In contrast to supervised fine-tuning, where a model learns to mimic responses for given prompts, in RLHF a model learns by generating a response and then receiving a score indicating how good that response is. Because using humans to assign response scores is expensive, scores are typically generated using an LLM <strong>reward model</strong> (RM) trained on human preference pairs (prompt, winning response, losing response). RLHF uses a reinforcement learning algorithm like <strong>proximal policy optimization</strong> (PPO) to update model parameters based on the scores from the RM.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l_TW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l_TW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 424w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 848w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 1272w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l_TW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png" width="862" height="185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:185,&quot;width&quot;:862,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41373,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l_TW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 424w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 848w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 1272w, https://substackcdn.com/image/fetch/$s_!l_TW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e6451d-1420-41a0-acbb-0087a2ea57f8_862x185.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Typical steps involved in RLHF fine-tuning.</figcaption></figure></div><p>RLHF was one of key innovations behind ChatGPT. Collecting high-quality data for training a RM can be very resource intensive. RLHF is often used to improve helpfulness and minimize harmfulness of model responses.</p><h2>Further reading</h2><ul><li><p><a href="https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html">Learning to summarize with human feedback</a> by Stiennon et al. and <a href="https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html">Training language models to follow instructions with human feedback</a> by Ouyang et al. were the first works that applied RLHF to LLMs. </p></li><li><p><a href="https://huyenchip.com/2023/05/02/rlhf.html">RLHF: Reinforcement Learning from Human Feedback</a> by Chip Huyen &#8212; This blog post provides an easy to follow explanation of RLHF and the intuition behind it. If you want to go a bit deeper, check out <a href="https://cameronrwolfe.substack.com/p/the-story-of-rlhf-origins-motivations">The Story of RLHF: Origins, Motivations, Techniques, and Modern Applications</a> by Cameron Wolfe.</p></li><li><p>If you prefer videos, have a look at <a href="https://www.youtube.com/watch?v=vJ4SsfmeQlk">Reinforcement Learning with Human Feedback (RLHF) in 4 minutes</a> by Sebastian Raschka.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><p></p>]]></content:encoded></item><item><title><![CDATA[ReAct Agent Model]]></title><description><![CDATA[ReAct (Reason + Act) is a design pattern for AI agents that incorporates planning and action execution. It has become a common way to implement agents.]]></description><link>https://oneminutenlp.com/p/react-agent-model</link><guid isPermaLink="false">https://oneminutenlp.com/p/react-agent-model</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Thu, 06 Feb 2025 03:45:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tU_v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tU_v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tU_v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tU_v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63850,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tU_v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!tU_v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25841394-8d8a-407b-80c7-f332ac9b1152_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>ReAct Agent Model</h1><p><strong>ReAct</strong> is an agent design pattern that incorporates <strong>reasoning</strong> (thinking through steps logically) and <strong>action</strong> (interacting with the environment). Prior prompting strategies either generate reasoning (e.g., chain-of-thought) or take actions (e.g., tool use) separately. In ReAct, an LLM is prompted to plan its next action(s), take action(s), and reflect on results &#8211; this is repeated until the agent considers a task complete. ReAct has been shown to perform well on complex tasks like multi-hop question answering and has become a common agent pattern built upon by other approaches like <strong>Reflexion</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6_78!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6_78!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 424w, https://substackcdn.com/image/fetch/$s_!6_78!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 848w, https://substackcdn.com/image/fetch/$s_!6_78!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 1272w, https://substackcdn.com/image/fetch/$s_!6_78!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6_78!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png" width="809" height="211" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfacb559-4388-4826-a442-ceaaef07deaf_809x211.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:211,&quot;width&quot;:809,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54175,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6_78!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 424w, https://substackcdn.com/image/fetch/$s_!6_78!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 848w, https://substackcdn.com/image/fetch/$s_!6_78!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 1272w, https://substackcdn.com/image/fetch/$s_!6_78!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfacb559-4388-4826-a442-ceaaef07deaf_809x211.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Typical steps implemented in ReAct.</figcaption></figure></div><p>ReAct may struggle if outcomes of actions it takes are incomplete or incorrect. It can have higher cost/latency than simpler methods due to more steps/tokens required. It has been shown to perform best with fine-tuning.</p><h2>Further reading</h2><ul><li><p><a href="https://openreview.net/forum?id=WE_vluYUL-X">ReAct: Synergizing Reasoning and Acting in Language Models</a> by Yao et al. &#8212; This paper introduced the ReAct Agent Model. <a href="https://research.google/blog/react-synergizing-reasoning-and-acting-in-language-models/">This blog post</a> by the paper&#8217;s authors provides a concise summary of the paper.</p></li><li><p>Both Hugging Face <a href="https://huggingface.co/docs/transformers/main/agents">Transformers</a> and LangChain <a href="https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/">LangGraph</a> provide implementations of ReAct. You can view the <a href="https://smith.langchain.com/hub/hwchase17/react">ReAct prompt used by LangChain</a> in the LangChain Hub.</p></li><li><p><a href="https://huyenchip.com/2025/01/07/agents.html#agent_overview">Agents</a> by Chip Huyen &#8212; this fantastic blog post covers much more than just ReAct, but if you are looking to understand agents and where ReAct fits in, this is a great place to start.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p>]]></content:encoded></item><item><title><![CDATA[Knowledge Distillation]]></title><description><![CDATA[Knowledge Distillation is a popular technique for transferring knowledge from large, powerful models to smaller, more efficient models.]]></description><link>https://oneminutenlp.com/p/knowledge-distillation</link><guid isPermaLink="false">https://oneminutenlp.com/p/knowledge-distillation</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Wed, 29 Jan 2025 05:01:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iCgc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iCgc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iCgc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iCgc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!iCgc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e2ced2-7475-4230-81a1-dad84ab9cad9_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Knowledge Distillation</h1><p><strong>Knowledge Distillation</strong> (KD) is a form of model compression used to transfer knowledge from a large, powerful <strong>teacher</strong> model to a (typically) smaller, more efficient <strong>student</strong> model. In contrast to supervised learning where a model is trained using labeled data (inputs and expected outputs &#8212; also known as <strong>hard targets</strong>), in KD, the student is typically also trained using the teacher&#8217;s reasoning (<strong>soft targets</strong>). Different methods exist for extracting the teacher&#8217;s reasoning, such as using its weights, analyzing the probabilities it assigns to possible outputs, or generating a rationale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7LtA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7LtA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 424w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 848w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 1272w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7LtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png" width="933" height="262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc664937-34d8-473b-afee-229d10b70823_933x262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:262,&quot;width&quot;:933,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78544,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7LtA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 424w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 848w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 1272w, https://substackcdn.com/image/fetch/$s_!7LtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc664937-34d8-473b-afee-229d10b70823_933x262.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two example KD methods &#8212; extracting a teacher&#8217;s reasoning from Chain of Thought prompting and from the teacher&#8217;s output distribution.</figcaption></figure></div><p>A popular Transformer model DistilBERT is a distilled version of BERT. It&#8217;s 40% smaller and 60% faster, while retaining over 95% of BERT&#8217;s performance. Some speculate that GPT-4o is a distilled version of some larger model.</p><h2>Further reading</h2><ul><li><p><a href="https://arxiv.org/abs/1503.02531">Distilling the Knowledge in a Neural Network</a> by Hinton et al. &#8212; This seminal paper formulated the concept of knowledge distillation.</p></li><li><p><a href="https://arxiv.org/abs/2402.13116">A Survey on Knowledge Distillation of Large Language Models</a> by Xu et al. &#8212; If you want to dive deeper, this recent survey provides an overview of different methods for LLM distillation.</p></li><li><p>The SetFit library by HuggingFace for fine-tuning sentence transformers provides a <a href="https://huggingface.co/docs/setfit/en/how_to/knowledge_distillation">Knowledge Distillation</a> guide. OpenAI also provides an <a href="https://openai.com/index/api-model-distillation/">API for knowledge distillation</a>.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Mixture of Experts]]></title><description><![CDATA[Mixture of Experts (MoE) is an ensemble learning technique that enables creating larger models without increasing training and inference cost.]]></description><link>https://oneminutenlp.com/p/mixture-of-experts</link><guid isPermaLink="false">https://oneminutenlp.com/p/mixture-of-experts</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Sun, 19 Jan 2025 04:40:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5G5Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5G5Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5G5Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5G5Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf75c095-c525-43c7-afb4-360511b810a2_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5G5Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!5G5Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf75c095-c525-43c7-afb4-360511b810a2_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Mixture of Experts</h1><p>Mixture of Experts (MoE) is an ensemble learning technique that trains multiple &#8220;expert&#8221; (sub-)models and a &#8220;gate&#8221; network that determines which expert(s) to use for a particular input. In Language Models, MoE is typically implemented as a sparse layer composed of multiple expert sub-layers and a router; the experts can be a simple feed forward network (FFN) or a more complex network, and the router is typically a linear layer with a softmax function. MoE enables larger models (more parameters) while still being cheap during training and inference because only a subset of the experts is active for a given input (this is called conditional computation).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UVYd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UVYd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 424w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 848w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 1272w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UVYd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png" width="809" height="218" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a74230e-860f-4216-b096-cccdf790581d_809x218.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:218,&quot;width&quot;:809,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85323,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UVYd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 424w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 848w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 1272w, https://substackcdn.com/image/fetch/$s_!UVYd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a74230e-860f-4216-b096-cccdf790581d_809x218.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Comparison between a dense layer and a sparse layer used in MoE.</figcaption></figure></div><p>Mixtral 8x7B is an example of a MoE. In Mixtral, the feed forward layer of each transformer block is replaced by a MoE layer. Each token in a given input sequence activates a different set of experts.</p><h2>Further reading</h2><ul><li><p><a href="https://huggingface.co/blog/moe">Mixture of Experts Explained</a> by Sanseviero et al. &#8212; This article provides an overview of MoE including the history, challenges, and current developments.</p></li><li><p><a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts">A Visual Guide to Mixture of Experts (MoE)</a> by Maarten Grootendorst &#8212; If you are a visual learner, this guide does a great job of breaking the technique down into individual components and explaining the intuition behind them.</p></li><li><p><a href="https://arxiv.org/abs/2407.06204">A Survey on Mixture of Experts</a> by Cai et al. &#8212; If you want to dive deeper, this survey covers the different ways of building MoE models.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Low-Rank Adaptation]]></title><description><![CDATA[Low-Rank Adaptation (LoRA) is a popular method for Parameter-Efficient Fine-Tuning of Large Language Models. LoRA significantly improves fine-tuning efficiency and decreases storage requirements.]]></description><link>https://oneminutenlp.com/p/low-rank-adaptation</link><guid isPermaLink="false">https://oneminutenlp.com/p/low-rank-adaptation</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Thu, 25 Jul 2024 20:04:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0PIL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0PIL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0PIL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0PIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0PIL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!0PIL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f91e368-9eb2-49c1-a915-62fe1860572f_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Low-Rank Adaptation</h1><p>Low-Rank Adaptation (LoRA) is a popular method for Parameter-Efficient Fine-Tuning (PEFT) of Large Language Models. Fine-tuning a LLM can significantly improve performance but can be prohibitively costly due to model size. Instead of updating all weights, LoRA freezes the original weights <em>W</em> and only trains a weight update <em>&#916;W</em> represented as two smaller rank decomposition matrices <em>A</em> and <em>B</em> which are much more efficient to train and store. During inference, this weight update is added to the original weights.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LXL5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LXL5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 424w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 848w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 1272w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LXL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png" width="1284" height="364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:364,&quot;width&quot;:1284,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LXL5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 424w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 848w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 1272w, https://substackcdn.com/image/fetch/$s_!LXL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ab94b67-4efa-45e2-9a77-3121d6c88c45_1284x364.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparison between full fine-tuning and LoRA adaptation.</figcaption></figure></div><p>For example, LoRA can reduce the number of trainable parameters of GPT-3 from 175B to 37.7M while performing as well as if all weights were fully fine-tuned. LoRA can be applied to any subset of weight matrices of a model (most commonly the attention and/or the feedforward layers). Multiple LoRA modules can be trained for different tasks and swapped during inference.</p><h2>Further Reading </h2><ul><li><p><a href="https://arxiv.org/abs/2106.09685">LoRA: Low-Rank Adaptation of Large Language Models</a> by Hu et al. &#8212; This paper first introduced the LoRA technique.</p></li><li><p><a href="https://arxiv.org/pdf/2303.15647">Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning</a> by Lialin et al. &#8212; this paper presents a detailed survey (albeit a bit outdated) and a taxonomy of parameter-efficient fine-tuning methods.</p></li><li><p><a href="https://huggingface.co/docs/peft/">PEFT</a> is an implementation of various Parameter-Efficient Fine-Tuning techniques including LoRA built by Hugging Face.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Temperature Sampling]]></title><description><![CDATA[Temperature is a common LLM hyperparameter that controls the randomness of the model's output. This post explains how temperature sampling works.]]></description><link>https://oneminutenlp.com/p/temperature-sampling</link><guid isPermaLink="false">https://oneminutenlp.com/p/temperature-sampling</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Mon, 15 Jul 2024 03:48:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-KyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-KyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-KyL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-KyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67544,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-KyL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!-KyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e30b6b2-1371-4b81-ba50-9b2e90d3ad4c_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Sampling is a common method used by LLMs for generating output tokens: the next token is randomly chosen based on the token probabilities learned by the LLM. Temperature sampling affects the shape of the probability distribution of tokens by introducing a scaling factor &#964; (temperature):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y=\\text{softmax}(\\frac{\\text{logits}}{\\tau})&quot;,&quot;id&quot;:&quot;BEWWBAKIOK&quot;}" data-component-name="LatexBlockToDOM"></div><p>i.e., the raw model scores are divided by &#964; before normalizing with softmax.</p><p>If &#964;=1.0, the probabilities are unchanged. When &#964; is closer to 0, the probability of high-probability words is increased and probability of low-probability words is decreased (i.e., randomness is reduced and the model is more likely to pick a high-probability word). Setting &#964; to a value greater than 1 has the opposite effect. Top-k and top-p sampling are alternatives to and can be used in conjunction with temperature sampling.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nQmh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nQmh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 424w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 848w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 1272w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nQmh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png" width="1377" height="294" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:294,&quot;width&quot;:1377,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nQmh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 424w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 848w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 1272w, https://substackcdn.com/image/fetch/$s_!nQmh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9db3f5-e8be-42ed-b07f-557a3662e00d_1377x294.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Example probabilities based on different temperature settings</figcaption></figure></div><h2>Further Reading </h2><ul><li><p><a href="https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/">Your settings are (probably) hurting your model - Why sampler settings matter</a> (reddit.com/r/LocalLLaMA post by kindacognizant) &#8212; this post provides a fantastic explanation of how different temperature values affect model outputs and how temperature works together with top-k and top-p sampling.</p></li><li><p><a href="https://arxiv.org/abs/2402.05201">The Effect of Sampling Temperature on Problem Solving in Large Language Models</a> by Renze and Guven &#8212; This recent work presents a detailed investigation of the effect of different temperature settings on LLM performance on MCQA problems.</p></li><li><p><a href="https://web.stanford.edu/~jurafsky/slp3/">Speech and Language Processing</a> by Jurafsky and Martin (free to read online) &#8212; Section 10.8 (Large Language Models: Generation by Sampling) provides a great explanation of different sampling techniques.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Byte-Pair Encoding Algorithm]]></title><description><![CDATA[The focus of the second issue of One Minute NLP is BPE, a popular tokenization algorithm which is used by most LLMs including GPT, Llama, and Mistral.]]></description><link>https://oneminutenlp.com/p/byte-pair-encoding-algorithm</link><guid isPermaLink="false">https://oneminutenlp.com/p/byte-pair-encoding-algorithm</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Mon, 08 Jul 2024 02:58:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IPqB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IPqB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IPqB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IPqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IPqB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!IPqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82dd9195-5170-452a-95c5-31c67d68c38d_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Byte-Pair Encoding Algorithm</h1><p>Byte-Pair Encoding (BPE) is a tokenization algorithm which is used by most LLMs (e.g., GPT, Llama) to build their vocabularies. Tokens created by BPE are called <strong>subwords</strong> because they are often smaller than words &#8211; these are typically the most common subwords that appear in the pretraining set. To determine these subwords, BPE starts with individual characters as the token set and then iteratively merges the two most common consecutive tokens to form a new token until a target vocabulary size is reached. BPE can deal with unknown words during inference because new words can be represented by some sequence of existing subwords. <strong>WordPiece</strong> and <strong>SentencePiece</strong> tokenization are popular alternatives to BPE and work similarly.</p><p>Consider an example corpus: &#8220;llama llama red pajama&#8220;.</p><p>In BPE tokenization, the starting vocabulary consists of individual characters. The corresponding corpus representation would look like this (dashes indicate token boundaries):</p><blockquote><p><strong>Iteration:</strong> 0</p><p><strong>Vocabulary:</strong> l a m r e d p j</p><p><strong>Corpus representation:</strong> l-l-a-m-a l-l-a-m-a r-e-d p-a-j-a-m-a</p></blockquote><p>Vocabulary and corpus representation after the first iteration (a+m were the most frequent consecutive tokens and are merged into a new token am):</p><blockquote><p><strong>Iteration:</strong> 1</p><p><strong>Vocabulary:</strong> l a m r e d p j am </p><p><strong>Corpus representation:</strong> l-l-am-a l-l-am-a r-e-d p-a-j-am-a</p></blockquote><p>Vocabulary and corpus representation after the second iteration:</p><blockquote><p><strong>Iteration:</strong> 2</p><p><strong>Vocabulary:</strong> l a m r e d p j am ama</p><p><strong>Corpus representation:</strong> l-l-ama l-l-ama r-e-d p-a-j-ama</p></blockquote><p>Vocabulary and corpus representation after the fifth iteration:</p><blockquote><p><strong>Iteration:</strong> 5</p><p><strong>Vocabulary:</strong> l a m r e d p j am ama ll llama re</p><p><strong>Corpus representation:</strong> llama llama re-d p-a-j-ama</p></blockquote><h2>Further Reading</h2><ul><li><p><a href="https://aclanthology.org/P16-1162/">Neural Machine Translation of Rare Words with Subword Units</a> by Sennrich et al. &#8212; This work was the first to introduce the BPE algorithm to NLP through an application in Neural Machine Translation.</p></li><li><p><a href="https://huggingface.co/docs/transformers/en/tokenizer_summary">Summary of the Tokenizers</a> (Hugging Face docs) &#8212; this page provides a fantastic overview of different tokenization techniques including a detailed explanation of the BPE algorithm.</p></li><li><p>There are many implementations of BPE, three notable ones include the <a href="https://github.com/openai/tiktoken">tiktoken</a> library (used by OpenAI&#8217;s models), <a href="https://github.com/karpathy/minbpe">minbpe</a> (Andrej Karpathy&#8217;s minimal implentation), and <a href="https://github.com/google/sentencepiece">sentencepiece</a> (Google&#8217;s implementation of BPE which, unlike the original implementation, doesn&#8217;t require input text to be split into words).</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Perplexity]]></title><description><![CDATA[Welcome to the first issue of One Minute NLP! The focus of this issue is perplexity, a metric commonly used to evaluate language models.]]></description><link>https://oneminutenlp.com/p/perplexity</link><guid isPermaLink="false">https://oneminutenlp.com/p/perplexity</guid><dc:creator><![CDATA[Dasha Herrmannova]]></dc:creator><pubDate>Fri, 28 Jun 2024 12:03:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eJMT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eJMT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eJMT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eJMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eJMT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!eJMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f1c6d1-573d-4bf7-aec8-a811bd5413a2_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Perplexity</h2><p>Perplexity (usually abbreviated <em>PP</em> or <em>PPL</em>) is a metric commonly used to evaluate language models. Perplexity measures how uncertain (or &#8220;perplexed&#8221;) a model is about the predictions it makes. The lower the perplexity, the better a model predicts the test set. Perplexity usually correlates well with improvements on real world tasks, but it is not a guarantee of better task performance. The perplexity of two models is only comparable if they use the same vocabularies.</p><p>Given an example text sequence <em>X</em>: <em>The quick fox jumps over the lazy dog, </em>we can calculate <em>PP</em> using the probability (log-likelihood) of predicting the next word given the words that came before it: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;PP(X)=2^{-l},&quot;,&quot;id&quot;:&quot;EHGDYRKCSY&quot;}" data-component-name="LatexBlockToDOM"></div><p>where</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;l = \\frac{\\log{p(\\textup{The})}+\\log{p(\\textup{quick}|\\textup{The})}+\\log{p(\\textup{fox}|\\textup{The quick})}+\\cdots}{N}&quot;,&quot;id&quot;:&quot;VWGARCPIYQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and <em>N</em> is the number of words in the sequence (<em>N=8</em> for our test sequence).</p><p>Formally:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;PP(X)=2^{-\\frac{1}{N}\\sum^{N}_{i}{\\log{p(x_i|x_1,x_2,\\cdots,x_{i-1})}}}&quot;,&quot;id&quot;:&quot;RYRSGLIRVE&quot;}" data-component-name="LatexBlockToDOM"></div><h3>Further Reading</h3><ul><li><p><a href="https://web.stanford.edu/~jurafsky/slp3/">Speech and Language Processing</a> by Jurafsky and Martin (free to read online) &#8212; Section 3.3 (Evaluating Language Models: Perplexity) provides a great explanation of the metric.</p></li><li><p><a href="https://arxiv.org/abs/2405.14782">Lessons from the Trenches on Reproducible Evaluation of Language Models</a> by Biderman et al. &#8212; this paper includes some practical considerations for using perplexity to compare language modeling performance of different models (Appendix A.3).</p></li><li><p><a href="https://huggingface.co/docs/transformers/en/perplexity">Perplexity of fixed-length models</a> (Hugging Face docs) &#8212; how to calculate perplexity with the Hugging Face Transformers library.</p></li></ul><div><hr></div><h3>Do you want to learn more NLP concepts?</h3><p>Each week I pick one core NLP concept and create a one-slide, one-minute explanation of the concept. To receive weekly new posts in your inbox, subscribe here:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://oneminutenlp.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Reach out to me:</p><ul><li><p>Connect with me on <a href="https://www.linkedin.com/in/herrmannova/">LinkedIn</a></p></li><li><p>Read my technical blog on <a href="https://medium.com/@robodasha">Medium</a></p></li><li><p>Or send me a message by responding to this post</p></li></ul><p>Is there a concept you would like me to cover in a future issue? Let me know!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://oneminutenlp.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading One Minute NLP! Subscribe for free to receive weekly new posts in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>