The Eagle/Finch Paper

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

The Eagle/Finch Paper

Title

Who

A collaborative effort between researchers from the RWKV Project (under Linux Foundation AI & Data), EleutherAI, and various universities, including Ohio State University, University of California Santa Barbara, and University of British Columbia.

Why

The research aims to address the limitations of traditional transformer architectures, specifically their quadratic time complexity with respect to input sequence length. The goal is to improve the efficiency and performance of LLMs, making them more suitable for tasks involving long sequences.

How

  • Experiment Design: The researchers developed two new RNN architectures: Eagle (RWKV-5) and Finch (RWKV-6).

  • Key Variables & Models: The study focuses on the impact of multi-headed matrix-valued states and a dynamic recurrence mechanism on LLM performance.

  • Datasets: A new multilingual corpus with 1.12 trillion tokens, "RWKV World v2", was created to train the models.

  • Techniques & Innovations: The research introduces data-dependent functions for time-mixing and token-shift modules and utilizes the Low-Rank Adaptation (LoRA) function for context-dependent weight adjustments.

What did they find

  • Eagle and Finch models achieve competitive performance on various benchmarks, including multilingual and English-focused language tasks, associative recall, music modeling, and vision-language tasks.

  • Finch demonstrates exceptional accuracy in multi-query associative recall (MQAR), an indicator of the effectiveness of in-context learning and hence critical for new architecture development, surpassing other non-transformer architectures.

  • Both architectures show an improved loss on long sequence tasks compared to RWKV-4.

Eagle and Finch extrapolating to 100k tokens for free! Figure 5: Loss along sequence offset for 3B RWKV-4 World, Eagle and Finch on PG19 dataset. All models were pretrained with context length 4096.

What are the limitations and what's next

  • Limitations: The models struggle with embedding tasks and exhibit some ChatGPT-like behavior due to the training data containing synthetic data from GPT-3.5 and ChatGPT.

  • Future Work: Expanding the training corpus size and diversity, training larger versions of Finch, and exploring Mixture of Experts (MoE) for further efficiency gains are planned.

Why it matters

This research demonstrates the potential of RNNs as a competitive alternative to transformers in LLMs, offering comparable performance while maintaining efficient inference and training. The development of Eagle and Finch architectures, along with the release of pre-trained models and open-source training pipeline, contributes to the advancement of more efficient and accessible AI models.

Additional Notes

  • The models and training code are available on Hugging Face and GitHub under the Apache 2.0 license.

Disclosure: Entities affiliated with Ate-A-Pi have commercial advisory relationships entities affiliated with researchers mentioned above

Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.