Low-Rank Adaptation

Low-Rank Adaptation (LoRA) is a popular method for Parameter-Efficient Fine-Tuning of Large Language Models. LoRA significantly improves fine-tuning efficiency and decreases storage requirements.

Jul 25, 2024

Low-Rank Adaptation

Low-Rank Adaptation (LoRA) is a popular method for Parameter-Efficient Fine-Tuning (PEFT) of Large Language Models. Fine-tuning a LLM can significantly improve performance but can be prohibitively costly due to model size. Instead of updating all weights, LoRA freezes the original weights W and only trains a weight update ΔW represented as two smaller rank decomposition matrices A and B which are much more efficient to train and store. During inference, this weight update is added to the original weights.

Comparison between full fine-tuning and LoRA adaptation.

For example, LoRA can reduce the number of trainable parameters of GPT-3 from 175B to 37.7M while performing as well as if all weights were fully fine-tuned. LoRA can be applied to any subset of weight matrices of a model (most commonly the attention and/or the feedforward layers). Multiple LoRA modules can be trained for different tasks and swapped during inference.

One Minute NLP

Discussion about this post

One Minute NLP

Low-Rank Adaptation