Christophe Cerisara
cerisara@mastodon.online

FLM-101B: 90% of the performances of GLM-130B for 10% of the costs thanks to growing the LLM during training:
costs are reduced as models are much smaller for a long part of training. A similar conclusion is also reached
the TokenFormer paper:

https://arxiv.org/pdf/2309.03852

4 hours ago
Christophe Cerisara
cerisara@mastodon.online

Two specific dimensions exist in all LLMs to determine whether they're lying and the negation:

https://www.alphaxiv.org/abs/2407.12831

16 hours ago
Christophe Cerisara
cerisara@mastodon.online

Survey of Small LLM: https://arxiv.org/abs/2409.15790v1

December 19, 2024
Christophe Cerisara
cerisara@mastodon.online

Open-source AI event in Paris on the 22nd January with a focus on "data-respectful" #LLM: https://opensourceaisummit.eu/

December 19, 2024
Christophe Cerisara
cerisara@mastodon.online

What are mathematics proofs? An old but brilliant essay about our understanding of mathematics...

https://arxiv.org/pdf/math/9404236

December 19, 2024
Christophe Cerisara
cerisara@mastodon.online

Sparse Auto-Encoders, which are a hot topic right now in mechanistic interpretability of #LLM, are highly sensitive to inductive biases of the training pipeline:

https://arxiv.org/abs/2410.11767

December 19, 2024
Christophe Cerisara
cerisara@mastodon.online

LLM finetuning guide (100 pages): https://arxiv.org/pdf/2408.13296v1

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

New Monte Carlo Tree Search Reasoning work, also interpretable:
https://arxiv.org/abs/2410.01707

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

orrelation traps may occur in LLMs and be cofounded with overfitting;
This can be detected by examining the weights of the LLM;
an open source tool to do just that: https://weightwatcher.ai/

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

Evidence that #LLM reason by leveraging procedural knowledge from their pretraining data, and not simply by combining partial solutions retrieved from their memorized pretraining corpus:
https://arxiv.org/abs/2411.12580

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

Practical use case of narrative creation with LLMs: https://nlr.ai/
A 300 pages books (in French) created by a team of 10 LLM agents...
In progress...
https://github.com/Lesterpaintstheworld/terminal-velocity

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

Implementing Reinforcement Learning with LLMs:
https://arxiv.org/pdf/2411.14251

The state space is redefined as pure texts (outputed from LLM);
Temporal Differnece, Bellman's equations are adapted to LLM text I/O;
Learning is demonstrated on a few classical and simple RL tasks: maze navigation,
TicTacToe, simpler chess...

This is another interesting application use case of LLM agents and reasoning.

December 03, 2024
Christophe Cerisara
cerisara@mastodon.online

Transformers learn causal structures in the input with SGD:
https://arxiv.org/abs/2402.14735

November 24, 2024