3476, 477, 12274, 112838, 248

Introduction When working with Large Language Models, we often focus on their remarkable capabilities - from writing code to explaining complex concepts. However, there’s a crucial component that can significantly impact their behavior and performance: tokenization 🍣. As highlighted in a recent work by Garreth Lee and the Hugging Face team 🤗 1, even state-of-the-art models can stumble on seemingly simple tasks due to tokenization choices. For instance, many models struggle with the basic question “Which is bigger?...

January 26, 2025 · 7 min · Andrea Gemelli

Introduction to Retrieval Augmented Generation

Introduction Recentely my PhD supervisor called me, asking: “Would you like to come to one of my lecture and present the students an use-case of yours? You choose!”. Of course, as the avarage infamous PhD experience with supervisors, I could not say no 😂 Jokes aside, it has been not the first time I hold a lecture at the University of Florence about Natural Language Processing, but this time I wanted to talk about something kind of new: retrieval augmented generation (RAG)....

June 5, 2024 · 5 min · Andrea Gemelli