Christophe Cerisara
cerisara@mastodon.online

Edited #LLM knowledge may fail to propagate in multi-hop questions (e.g. in "the birth country of the father of...");
this paper analyzes the 2 circuits that are responsible for each reasoning "hop" and use them to edit knowledge:
https://arxiv.org/pdf/2503.16356

6 hours ago
Christophe Cerisara
cerisara@mastodon.online

#LLM adaptation with LoRA is still too costly, and several works try to further reduce this cost.
To do that, LoRAM proposes to first prune the LLM and only finetune this prunes LLM with LoRA:
https://arxiv.org/pdf/2502.13533

1 day ago
Christophe Cerisara
cerisara@mastodon.online

This paper proposes a new metric to locate knowledge neurons in #LLM, based on
the change in target word logits when removing this neuron. It shows that both attention
and MLP encode knowledge, and that various types of knowledge is stored in different layers:

https://arxiv.org/pdf/2312.12141

March 08, 2025
Christophe Cerisara
cerisara@mastodon.online

Job opportunity in Nancy, France: research engineer for pretraining LLMs. Candidate here: https://emploi.cnrs.fr/Offres/CDD/UMR7503-CHRCER-003/Default.aspx

March 05, 2025
Christophe Cerisara
cerisara@mastodon.online

Compression is prediction, and it is "closely related to the ability to generalize"; this is true for #LLM and Ilias Sutskever explains the effectiveness of unsupervised learning with compression. This paper exploits this connection to evaluate LLMs: https://arxiv.org/pdf/2402.00861

March 04, 2025
Christophe Cerisara
cerisara@mastodon.online

XLand-100B: a large scale datasets of 30,000 in-context reinforcement learning tasks trajectories
https://arxiv.org/pdf/2406.08973

February 27, 2025
Christophe Cerisara
cerisara@mastodon.online

https://arxiv.org/pdf/2501.09751 proposes an #LLM agent that progressively expands a knowledge tree, using search engines, and a conceptual pool which represents a kind of cognitive network, using reflection, in order to better answer a question. It can be thought as a very advanced RAG strategy.

February 27, 2025
Christophe Cerisara
cerisara@mastodon.online

This paper shows that #LLM in-context learning follows a scaling law (such as with training), and that structured representations of in-context samples emerge after a given scale: very reminiscent of training scaling laws isn't it? This also means that ICL may not work as well as it could if there's less than, say, 400 few-shot samples; hum...

https://arxiv.org/abs/2501.00070

February 25, 2025
Christophe Cerisara
cerisara@mastodon.online

A frozen text-only #LLM can handle other modalities in-context by using an encoder and training a projector that projects the other modality embedding into the input text embedding space of the LLM:
https://arxiv.org/abs/2410.05629

February 23, 2025
Christophe Cerisara
cerisara@mastodon.online

Fantastic blog and paper showing that #LLM reason by generalizing from procedural knowledge in the pretraining corpus, and not by memorization: https://lauraruis.github.io/2024/11/10/if.html

February 23, 2025
Christophe Cerisara
cerisara@mastodon.online

LIMO is a paper very similar to the S1 model, which carefully crafts less than 1000
high-quality long reasoning chains and show that supervised finetuning gives even better results than o1;
one thing puzzles me a bit: they seem to select examples that are badly recognized by these o1 and R1 models?

https://arxiv.org/abs/2502.03387

February 20, 2025
Christophe Cerisara
cerisara@mastodon.online

A rich survey of planning evaluation for #LLM agents
https://arxiv.org/abs/2502.11221

February 20, 2025
Christophe Cerisara
cerisara@mastodon.online

Augment #LLM knowledge edits with contextual edits extracted from a knowledge graph to facilitate multi-hop reasoning (like who is UK Prime Minister's wife): https://arxiv.org/abs/2502.10626

February 20, 2025
Christophe Cerisara
cerisara@mastodon.online

Yet another approximation of RLHF for #LLM alignment: after DPO, PPO, GRPO... this one exploits
variational approximation, leading to, I think, a very interesting reward-weighted SFT algo:
https://arxiv.org/abs/2502.11026

February 20, 2025
Christophe Cerisara
cerisara@mastodon.online

Continual pretraining of large LLMs (Llama2-7B) degrades perplexity:
https://arxiv.org/pdf/2402.17400

February 15, 2025
Christophe Cerisara
cerisara@mastodon.online

InnerThoughts: they just add an MLP to predict the output from last-token latent #LLM representations, or did I miss something? If so, what's the difference with a simplified Ladder Side Network?
https://arxiv.org/pdf/2501.17994

February 12, 2025
Christophe Cerisara
cerisara@mastodon.online

New method to adapt an #LLM to specific task: first, a new PEFT method: Singular Value Fine-Tunig, plus the Transformer-Squared, which proceeds in 2 passes: first to infer which experts to use, and a second pass for actual inference: https://arxiv.org/abs/2501.06252

February 12, 2025
Christophe Cerisara
cerisara@mastodon.online

Sandbagging: LLMs can underperform when exposing their true capabilities is at risk: https://arxiv.org/abs/2406.07358

February 10, 2025