Christophe Cerisara
cerisara@mastodon.online

OpenChat 7b: github.com/imoneoi/openchat
Is it really so good?

November 03, 2023
Christophe Cerisara
cerisara@mastodon.online

2 Lessons about from this paper: 1) We still don't understand how to "control" LLMs and get the best out of them; 2) is a greatly underrated model! (there are several other papers that show Bloom is really a good model; I should take the time to compile them all soon...)

arxiv.org/abs/2307.11760

November 03, 2023
Christophe Cerisara
cerisara@mastodon.online

Continual learning of requires to weight every token according to their importance; otherwise, new facts are not memorized enough. This weighting can be done by meta-training a model: great work from Stanford here! arxiv.org/abs/2305.15076

November 02, 2023
Christophe Cerisara
cerisara@mastodon.online

How to tune hyper-parms of a giga-size ?
muP: tune them on a small model, and then use the same values for the big one:
arxiv.org/pdf/2203.03466.pdf

October 30, 2023
Christophe Cerisara
cerisara@mastodon.online

A glimpse of hope to tackle catastrophic forgetting in : it may be that the model does not forget everything, but just that part of the performance degradation is due to a misalignment of the output classification hyperplane, the knowledge being still in there, especially with large models: arxiv.org/abs/2310.05644

October 29, 2023
Christophe Cerisara
cerisara@mastodon.online

Very smart study to make the most out of constrained data, suggesting repeating 4x data and doubling data with code (which is more available than text) helps:
arxiv.org/pdf/2305.16264.pdf

October 27, 2023
Christophe Cerisara
cerisara@mastodon.online

Je reviens tout juste d'avoir donné un talk à et j'ai adoré la communauté, l'ambiance "geek" et la volonté de gratuité et d'ouverture des orgas, merci pour l'invitation !

October 27, 2023
Christophe Cerisara
cerisara@mastodon.online

Following the fantastic works from Carlini and Feldmann, the field of extracting,
or detecting specific data in pretrained is progressing rapidly:
swj0419.github.io/detect-pretr

This is yet another tool to check for private/protected/copyrighted data in ,
such as books in GPT3. Does it mean it will soon not be enough to keep the pretraining
data secret for brand new ? Or will counter-measures be developped to better
hide the pretraining data?

October 27, 2023
Christophe Cerisara
cerisara@mastodon.online

Emergence of Theory of Mind in is not there yet;
designing more complex tests for shows the limits of LLMs:
hyunw.kim/fantom/

October 27, 2023
Christophe Cerisara
cerisara@mastodon.online

Improving reasoning and planning abilites with Language Agent Tree Search (LATS):
arxiv.org/abs/2310.04406

October 22, 2023
Christophe Cerisara
cerisara@mastodon.online

There has been several large improvements for with math problems solving recently, such as Llemma
and here ToRA, which is trained by imitation learning to use external math tools:
arxiv.org/abs/2309.17452

October 22, 2023
Christophe Cerisara
cerisara@mastodon.online

can produce rules, which help in reasoning tasks and facilitate interpretation of results.
Examples given for arithmetic rules and family relations:
arxiv.org/abs/2310.07064

October 22, 2023
Christophe Cerisara
cerisara@mastodon.online

Retro 48b and InstructRetro: a foundation pretrained with retrieval:
arxiv.org/abs/2310.07713

October 22, 2023
Christophe Cerisara
cerisara@mastodon.online


The first conference on Language Modeling: COLM
deadline: March 2024

colmweb.org/

October 22, 2023
Christophe Cerisara
cerisara@mastodon.online

Vanilla may perform as well as S4 on Long range arena
when pretrained with denoising objective on the target task data:
arxiv.org/abs/2310.02980

October 21, 2023
Christophe Cerisara
cerisara@mastodon.online

It is possible to somehow control which abilities emerge in by specific data mixes;
an option to limit forgetting may be train first with specific abilities and then
generic abilities with small amount of specific abilities:
arxiv.org/abs/2310.05492

October 21, 2023