Data distillation for #LLM: gives same performances with 0.1% of data,
but on a limited focused evaluation afaik: does this approach give
models as generic as with lots of data??
https://arxiv.org/abs/2310.09983
Interesting alternative for long context in #LLM
walks through the documents and adapt the prompt accordingly:
https://arxiv.org/abs/2310.05029
Get help from #LLM in your command-line: gorilla-cli
A very nice project from Berkeley Univ!
52 cognitive #bias in French
Jeu de cartes sous licence CC-BY de 52 biais cognitifs en français.
http://olkihost.loria.fr/cerisara/52-cartes-biais-cognitifs-laurence-vagner-stephanie-walter.pdf
New #PhD opportunity in Nancy France on continual learning of large language models:
https://members.loria.fr/CCerisara/#phdANR.html
Don't hesitate to contact us for further details!
Translating to English before #nlp processing works better than multi-lingual processing; self-translating also works better, although a bit less good, but the delta might decrease with scale.
And if you're looking for a good open translation #llm, NLLB-200 is recommended by the authors:
Scaling laws for 4-bit #llm : https://arxiv.org/abs/2212.09720
It's best to use more parameters with 4-bits, than less parameters in 16-bits.
Also, SpQR improves over QLoRA with good scaling laws:
https://arxiv.org/abs/2306.03078
Another great report on training and finetuning details for Llama2:
https://arxiv.org/abs/2307.09288