Gaël Varoquaux

I finally found time for photography:
"Jazzy beats: Grandbrothers"

I took this picture with my mobile phone, during a concert at La Cigale (click through for full picture / highres)

March 07, 2024
GaelVaroquaux shared a status by ogrisel
Olivier Grisel

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.


December 07, 2023
Gaël Varoquaux

Software systems, more than any other engineering activity, create a technological world that results from social dynamics and constructs.
This is because the space of possibilities is much wider, and there are many more objects interacting than in other industrial endeavors.

Big thinkers of urban planning, designing spaces and cities accounting for interactions connected their thinking with sociology and related.

People thinking software at the ecosystem level probably should do the same.

January 13, 2024
Gaël Varoquaux

Avec la , le gouvernement manie la xénophobie, et veut inscrire la discrimination dans la loi.

C'est le programme de l'extrême droite, un programme de division et non de construction, un programme qui met notre démocratie sur une pente dangereuse.

December 19, 2023
Gaël Varoquaux

Une interview sur scikit-learn : la vision du projet, comment penser à l'impact, au lien avec la société, à la dynamique open-source... 45mn où je parle de ce qui nous motive, de ce que nous avons appris sur les données et l'humain...

Ce fut un grand plaisir, merci beaucoup à l'équipe, hymaïa dont Yoann Benoit.
Je me rends compte que j'ai une meilleure énonciation en français 🙂

December 19, 2023
GaelVaroquaux shared a status by catalystcoop
Catalyst Cooperative

A thread from @GaelVaroquaux looking at the impact of the community-driven sklearn compared to centralized corporate ML packages. Community isn't always fast or easy, but it can be very robust over the long term once it's established.

"People underestimate how impactful @sklearn continues to be" — François Chollet

December 16, 2023
GaelVaroquaux shared a status by catalystcoop
Catalyst Cooperative

This new skrub library from @GaelVaroquaux & other folks behind @sklearn looks like it could be very useful for our work. It focuses on systematizing messy data prep, which we've had to do a lot of in applying ML for entity matching between agencies. E.g. linking plant and utility IDs between FERC, EPA, EIA so the different data they report can be used in tandem:

December 15, 2023
Gaël Varoquaux

Join us: this is open source, and the power of such a project is the ability to build in common.

Let's create together a much-needed tool for data science

December 14, 2023
Gaël Varoquaux

Skrub is very young, and there is a lot more that needs done.

For instance, we want to support multiple dataframe backends and lazy modes.

Our dream is to streamline developing and put in production machine-learning by coupling the scikit_learn API to database operations.

December 14, 2023