OpenChat 7b: https://github.com/imoneoi/openchat
Is it really so good?
#LLM
2 Lessons about #LLM from this paper: 1) We still don't understand how to "control" LLMs and get the best out of them; 2) #Bloom is a greatly underrated model! (there are several other papers that show Bloom is really a good model; I should take the time to compile them all soon...)
Continual learning of #LLM requires to weight every token according to their importance; otherwise, new facts are not memorized enough. This weighting can be done by meta-training a model: great work from Stanford here! https://arxiv.org/abs/2305.15076
How to tune hyper-parms of a giga-size #LLM ?
muP: tune them on a small model, and then use the same values for the big one:
https://arxiv.org/pdf/2203.03466.pdf
A glimpse of hope to tackle catastrophic forgetting in #LLM: it may be that the model does not forget everything, but just that part of the performance degradation is due to a misalignment of the output classification hyperplane, the knowledge being still in there, especially with large models: https://arxiv.org/abs/2310.05644
Very smart study to make the most out of constrained data, suggesting repeating 4x data and doubling data with code (which is more available than text) helps:
https://arxiv.org/pdf/2305.16264.pdf
Je reviens tout juste d'avoir donné un talk à #CodeurEnSeine et j'ai adoré la communauté, l'ambiance "geek" et la volonté de gratuité et d'ouverture des orgas, merci pour l'invitation ! #CES2023
Following the fantastic works from Carlini and Feldmann, the field of extracting,
or detecting specific data in pretrained #LLM is progressing rapidly:
https://swj0419.github.io/detect-pretrain.github.io/
This is yet another tool to check for private/protected/copyrighted data in #LLM,
such as books in GPT3. Does it mean it will soon not be enough to keep the pretraining
data secret for brand new #LLMs ? Or will counter-measures be developped to better
hide the pretraining data?
Emergence of Theory of Mind in #LLM is not there yet;
designing more complex tests for #ToM shows the limits of LLMs:
https://hyunw.kim/fantom/
Improving #LLM reasoning and planning abilites with Language Agent Tree Search (LATS):
https://arxiv.org/abs/2310.04406
There has been several large improvements for #LLM with math problems solving recently, such as Llemma
and here ToRA, which is trained by imitation learning to use external math tools:
https://arxiv.org/abs/2309.17452
#LLM can produce rules, which help in reasoning tasks and facilitate interpretation of results.
Examples given for arithmetic rules and family relations:
https://arxiv.org/abs/2310.07064
Retro 48b and InstructRetro: a foundation #llm pretrained with retrieval:
https://arxiv.org/abs/2310.07713
#LLM
The first conference on Language Modeling: COLM
deadline: March 2024
Vanilla #transformer may perform as well as S4 on Long range arena
when pretrained with denoising objective on the target task data:
https://arxiv.org/abs/2310.02980
It is possible to somehow control which abilities emerge in #LLM by specific data mixes;
an option to limit forgetting may be train first with specific abilities and then
generic abilities with small amount of specific abilities:
https://arxiv.org/abs/2310.05492