🤗 Model | 📂 Github | ✍️ Blog | 💡Inspired by Sakana AI

Optimizing Model Merging with LLMs

I’ve always been interested in Small Language Models, so when I came across Sakana AI's paper on Evolutionary Optimization of Model Merging Recipes, it left a strong impression on me. The paper explores an innovative approach to optimizing two key spaces: the Data Flow Space (optimizing how layers from different models combine to form new models) and the Parameter Space (finding effective mixing strategies for model weights). Through these methods, Sakana AI devised a powerful way to discover optimal model combinations using evolutionary algorithms.

Model merging has become an essential part of scaling AI systems as researchers and practitioners seek to combine strengths from different models without retraining. Traditionally, CMA-ES and other algorithms have been used for this purpose, but with the recent breakthroughs in LLMs, I’m exploring new ways to enhance this process.

I wondered how the optimization process might change if LLMs were directly responsible for managing the model combinations. This curiosity led me to start this project.

The goal of this project is to integrate LLMs into evolutionary strategies to optimize model merging. Instead of following the CMA-ES approach, the idea is to improve model optimization by leveraging the search capabilities of LLMs.

⚠️ Currently, the project supports optimization within the parameter space, but I plan to extend its functionality to enable merging and optimization in the data flow space as well.

LLM-based Evolution Strategy

During my research, I found that there is already a study that experimented with using LLMs as part of an evolutionary algorithm in the Large Language Models As Evolution Strategies paper. What I found particularly interesting was that, in the early steps, LLM-based evolutionary strategies converged much faster than other evolutionary algorithms. Additionally, a larger model size did not always mean faster convergence.

To validate the paper's findings, I conducted Black Box Optimization experiments on six functions. The functions were chosen to represent diverse types of optimization challenges, ranging from simple quadratic functions to more complex, noisy landscapes. The goal was to test how well LLM-based strategies adapt to various types of optimization problems. The results are shown below:

The LLM-based optimization method indeed converged more quickly, making it suitable for the model merging process.

I implemented this approach based on the experiments and results in the paper, and the process for optimizing model merging with LLMs is as follows:

LLM Evolutionary Merge.gif

Initially, genomes are randomly initialized based on the population size. The dimension of each genome is (n_layer_groups, n_models, n_param_sets, n_params), which aligns with the parameter dimensions supported by the merge-kit.
The models are merged based on the gene values of the randomly initialized genomes, and a fitness score is calculated for a given task.
The LLM receives the top N genomes from the previous generation, mapped with their respective fitness scores, and generates the mean genome values and analysis for the next generation to maximize the fitness score.