Christophe Cerisara
cerisara@mastodon.online

Data distillation for : gives same performances with 0.1% of data,
but on a limited focused evaluation afaik: does this approach give
models as generic as with lots of data??
arxiv.org/abs/2310.09983

October 21, 2023
Christophe Cerisara
cerisara@mastodon.online

Interesting alternative for long context in
walks through the documents and adapt the prompt accordingly:
arxiv.org/abs/2310.05029

October 21, 2023
Christophe Cerisara
cerisara@mastodon.online

Get help from in your command-line: gorilla-cli
A very nice project from Berkeley Univ!

github.com/gorilla-llm/gorilla

October 21, 2023
Christophe Cerisara
cerisara@mastodon.online

52 cognitive in French

Jeu de cartes sous licence CC-BY de 52 biais cognitifs en fran├žais.

olkihost.loria.fr/cerisara/52-

August 26, 2023
Christophe Cerisara
cerisara@mastodon.online

New opportunity in Nancy France on continual learning of large language models:

members.loria.fr/CCerisara/#ph

Don't hesitate to contact us for further details!

August 07, 2023
Christophe Cerisara
cerisara@mastodon.online

Translating to English before processing works better than multi-lingual processing; self-translating also works better, although a bit less good, but the delta might decrease with scale.

And if you're looking for a good open translation , NLLB-200 is recommended by the authors:

arxiv.org/abs/2308.01223

August 04, 2023
Christophe Cerisara
cerisara@mastodon.online

Scaling laws for 4-bit : arxiv.org/abs/2212.09720
It's best to use more parameters with 4-bits, than less parameters in 16-bits.

Also, SpQR improves over QLoRA with good scaling laws:
arxiv.org/abs/2306.03078

August 04, 2023
Christophe Cerisara
cerisara@mastodon.online

Another great report on training and finetuning details for Llama2:
arxiv.org/abs/2307.09288

August 04, 2023