Christophe Cerisara
cerisara@mastodon.online

There are many types of adapters for LLM; although LoRA still seems to perform best
in most cases, there are interesting advantages for other types:
- for a review: arxiv.org/abs/2304.01933
- 2 interesting adapters: AdaMix and ladder side-tuning; a better extension of LST with low-rank att is given in arxiv.org/abs/2402.04009

April 20, 2024
Christophe Cerisara
cerisara@mastodon.online

TrustLLM: leaderboard for evaluating trustfulness of LLMs:
trustllmbenchmark.github.io/Tr

April 15, 2024
Christophe Cerisara
cerisara@mastodon.online
April 14, 2024
Christophe Cerisara
cerisara@mastodon.online

best practices for synthetic data for LLMs, by Google: arxiv.org/abs/2404.07503

April 13, 2024
Christophe Cerisara
cerisara@mastodon.online

GoEX enables users to verify post-facto the output of an LLM agent / actions,
while confining the LLM within a secure runtime and enabling "undoing" its actions
before going live: arxiv.org/abs/2404.06921

April 13, 2024
Christophe Cerisara
cerisara@mastodon.online

Combining context compression, retrieval and LoRA leads to LLoCO, a method to capture long
contexts for vanilla LLMs: arxiv.org/abs/2404.07979

April 13, 2024
Christophe Cerisara
cerisara@mastodon.online

In the series of "transformers for search", here is Searchformer, that encodes the
dynamics of the old-classic A* algorithm as tokens, and trains an enc-dec
model to minimize the number of steps. It ends up being faster than baselines for
sokoban solving: arxiv.org/abs/2402.14083

April 11, 2024
Christophe Cerisara
cerisara@mastodon.online

New paper for continual pretraining: Sailor, which shows the
importance of carefully tuning the learning rate to prevent
forgetting: arxiv.org/abs/2404.03608

Quite interesting and reasonable point of view for CL.
It's also part of the bunch of papers that show that we really
don't need very fancy/complex architectures, and that just
careful hyper-parameter tuning of the vanilla transformer does the job...

April 11, 2024
Christophe Cerisara
cerisara@mastodon.online

Another milestone in long-range attention, from Google: arxiv.org/pdf/2404.07143.pdf
The trick is to save the past KV cache into a compressed memory, at every layer, and to
inject it (kinda just-forward recurrence) into the attention when processing the next segment.
This leads to potentially infinite context length.

April 11, 2024
Christophe Cerisara
cerisara@mastodon.online

Great set of small tutos to try various applications, RAG-based, LLM evaluations, etc.
with open-souce models: huggingface.co/learn/cookbook/

April 02, 2024
Christophe Cerisara
cerisara@mastodon.online

Excellent tutorial on VAE and diffusion models:
arxiv.org/abs/2403.18103

March 31, 2024
Christophe Cerisara
cerisara@mastodon.online

The 2024 LLM scaling laws revisited: arxiv.org/abs/2403.17844

March 31, 2024
Christophe Cerisara
cerisara@mastodon.online

Removing complete layers, up to 40% of them for llama,
followed by a bit of PEFT healing, doe snot degrade that much:
arxiv.org/abs/2403.17887

March 31, 2024
Christophe Cerisara
cerisara@mastodon.online

Nice post with details of the Yi-9B model:
it has been obtained by expanding the Yi-6B LLM with additional layers
before further training on 800b tokens. Growing networks strike back!
Another "surprising" (well, may be not that much anymore...) heuristic is
to train with a constant LR, and increasing the batch size when the loss
does not decrease any more:
huggingface.co/blog/lorinma/yi

March 31, 2024
Christophe Cerisara
cerisara@mastodon.online

wanna train an LLM ? *the* video you wanna watch: youtube.com/watch?v=2-SPH9hIKT

March 29, 2024
Christophe Cerisara
cerisara@mastodon.online
March 19, 2024
Christophe Cerisara
cerisara@mastodon.online

infinite-lenth context generalization: arxiv.org/pdf/2308.16137.pdf

Idnetifies and analyses the 3 mains factors that prevent context length generalization
(even with relative positional encodings), and proposes a lambda-shaped mask.

March 16, 2024
Christophe Cerisara
cerisara@mastodon.online

Sophia is two times faster than Adam;
Sophia lightly estimate the diagonal Hessian:
arxiv.org/pdf/2305.14342.pdf

March 16, 2024
Christophe Cerisara
cerisara@mastodon.online

Open vs. closed-source LLMs trend:

Source: ARK Investment Management LLC, 2023. Data as of November 10, 2023

March 14, 2024
Christophe Cerisara
cerisara@mastodon.online

Studying the inductive bias of LLM using random networks: arxiv.org/abs/2403.02241

March 14, 2024