Chloé-Agathe Azencott

2025-03-23

Questionnaire Intelligence Artificielle et Enseignement

Le ministre de l’Enseignement Supérieur et de la Recherche a confié une mission sur « l’intelligence artificielle et l'enseignement » à François Taddei et Frédéric Pascal. Dans le contexte de cette mission, un questionnaire a été établi à destination de « l’ensemble des usagers et personnels de l’enseignement supérieur ».

Le questionnaire est anonyme, mais une fois que j'ai déclaré être une femme professeure en machine learning (considéré comme un sous-domaine de l'IA) à Mines Paris–PSL, j'ai l'anonymat d'un cygne au milieu d'un troupeau de canards. J'ai d'ailleurs choisi de donner mon nom à la fin du questionnaire, comme c'était proposé.

Au vu du temps et de l'énergie que répondre à ce questionnaire m'a pris, et des conversations semi-publiques que j'ai eues sur le sujet sur Mastodon, j'ai décidé de partager ici quelques unes de mes réponses. J'ai ajouté des liens hypertexte pour faciliter la lecture.

Tout d'abord il me semble important de noter que le questionnaire utilise « IA » sans définir le terme. Il m'a fallu plusieurs questions pour être sûre que l'enseignement et la recherche de et en intelligence artificielle, machine learning ou discipline approchante n'entraient pas dans le champ couvert par « IA ». Pire, si je suis raisonnablement convaincue qu'il s'agit ici de parler d'outils s'appuyant sur des modèles génératifs de texte ou d'images de grande taille, entraînés sur d'énormes volumes de donnés, je ne sais pas s'il s'agit plutôt de parler d'outils développés par des acteurs privés à destination du grand public (type ChatGPT, LeChat, MidJourney, Gemini, Claude, DeepSeek), ou plutôt des outils développés spécifiquement pour l'éducation (par exemple MathGPT). Je soupçonne que les outils prédatant les réseaux de neurones profonds et développés spécifiquement pour accompagner l'enseignement sont exclus—je pense par exemple au système expert générateur de problèmes de chimie organique pour étudiant·es de licence développé par Jonathan Chen, avec qui je partageais un bureau en thèse il y a une vingtaine d'années (Chen et al., 2009 doi:10.1021/ci900157k).

(Apparté : Jon est d'ailleurs une personne de référence sur l'utilisation d'outils d'IA générative dans le domaine médical, si le sujet vous intéresse.)

De nombreuses questions telles que « utilisez-vous l'IA dans votre vie professionnelle ? » sont donc particulièrement floues. J'ai utilisé les boîtes de texte des champs « Autre » pour préciser ma pensée, mais le sondage me semble mal conçu dès le départ.

Quels sont, selon vous, les risques d’une utilisation systématique de l’IA par les étudiants ?

Les risques directs commencent à être documentés : l'utilisation d'assistants basés sur GPT permet d'améliorer temporairement les performances des étudiantes et étudiants aux examens, mais lorsqu'on leur retire cette béquille, leurs notes se dégradent, montrant qu'ils et elles n'ont en fait pas acquis les compétences attendues (Bastani et al., 2024 doi:10.2139/ssrn.4895486) ; l'utilisation de ce types d'agents conversationnels, surtout quand on leur fait trop confiance, a un effet négatif sur les capacités cognitives des étudiantes et étudiants (Zhai et al., 2024 doi:10.1186/s40561-024-00316-7) ou des travailleuses et travailleurs des professions intellectuelles (Lee et al., 2025 doi:10.1145/3706598.3713778). L'utilisation systématique de ces outils ne permet pas aux étudiantes et étudiants de faire le cheminement mental nécessaire à l'apprentissage.

L'impact négatif sur la qualité du code produit par les développeurs et développeuses utilisant ces outils est aussi largement documenté (par exemple Perry et al., 2023 doi:3576915.3623157) avec les conséquences que l'on peut en tirer sur l'apprentissage de la programmation informatique et du développement numérique.

Un certain nombre de risques indirects existent aussi. L'utilisation de ces outils pose des aspects éthiques majeurs :

modèles entraînées sur des données mal acquises (Chesterman 2024 doi:10.1093/polsoc/puae006) : détournement du travail d'artistes (Jiang et al. 2023 doi:10.1145/3600211.3604681), acquisition illégale de données (The Unbelievable Scale of AI’s Pirated-Books Problem, Reisner 2025 in The Atlantic) ; incorporation des données des utilisateurs et utilisatrices ;
surexploitation du travail du clic (Casili 2019 hal-02173160 ; Casili et al. 2025hal-04952735) ;
impact écologique majeur dans un contexte de crise climatique (Resnik 2024 doi:10.21428/e4baedd9.cf3e35e5 ; Crawford 2024 doi:10.1038/d41586-024-00478-x ; [edit mai 2025] voir aussi la série Power Hungry: AI and our energy future de la MIT Technology Review) ;
génération de contenu biaisé, potentiellement blessant voire illégal (Vassel et al. 2024 doi:10.1609/aaaiss.v3i1.31251 ; UN Report A/HRC/56/68) ;
réduction de la diversité des nouveaux contenus (Doshi & Hauser 2024 doi:10.1126/sciadv.adn5290) et de la visibilité des discours déjà minoritaires (Arora et al. 2023 doi:10.1016/j.infoandorg.2023.100478) ;
appauvrissement de la qualité des contenus disponibles sur Internet, risquant d'ailleurs d'affecter négativement la qualité des modèles à venir (Shumailov et al. 2024 doi:10.1038/s41586-024-07566-y).

Quand bien même ils fonctionneraient (et j'insiste sur le fait que ce n'est pas établi), il n'est pas acceptable de fonder notre pédagogie sur de telles dérives.

Réserves sur l'utilisation d'outils d'IA en cours : Identifiez-vous d'autres risques/réserves ?

Cette question fait suite à une question demandant d'établir une hiérarchie dans mes réserves sur l'utilisation d'outils d'IA en cours, qui propose 5 réserves : pédagogiques (la pédagogie n'est pas automatisable) ; éducatives (temps d'écran, fracture numérique, inégalité d'accès aux ressources) ; écologiques ; techniques (qualité des infrastructures et du matériel) ; juridiques et éthiques.

Mes réserves pédagogiques ne peuvent pas être résumées à « la pédagogie n'est pas automatisable ». Les outils dont on parle ne permettent pas à ce stade d'automatiser la pédagogie, et mes réserves pédagogiques sont que ces outils ne permettent pas d'améliorer l'apprentissage, au contraire. Ils n'ont aucune notion de ce qu'est une réponse correcte ou non, fournissant seulement des réponses statistiquement probables et potentiellement fausses (« hallucinations »), encodant les biais présents dans leurs jeux d'entraînement, et produisant des résultats uniformément pauvres (« AI slop »).

Mes autres réserves sont politiques : la rapidité avec laquelle on essaie de nous pousser à adopter ces outils dans nos pratiques pédagogiques, alors même que les retours d'expérience sont mitigés—on est loin d'avoir vu un tel engouement pour la classe inversée ou non-notée, malgré une riche littérature scientifique sur le sujet depuis des années voire des décennies—est inquiétante. Dans un contexte de réduction des moyens d'un enseignement supérieur déjà exsangue, je crains qu'il ne s'agisse ici pas de pédagogie mais de démanteler encore un peu plus l'université française.

Il est aussi important de remarquer que ce mouvement se fait sous l'impulsion d'acteurs économiques puissants (Google, Microsoft, OpenAI, Anthropic, MistralAI, Nvidia etc), privés, qui défendent ici leurs propres intérêts. Beaucoup de ces acteurs portent une idéologie délétère, contraire à notre mission de service public (Debru & Torres 2024 doi:10.5210/fm.v29i4.13636). Certains d'entre eux entretiennent des liens forts avec l'administration Trump, dont l'autoritarisme et le refroidissement des relations avec l'UE ne font plus aucun doute. L'utilisation de ces outils pose donc des problèmes géopolitiques et de souveraineté nationale.

Selon vous, quel pourrait être le rôle des établissements d’enseignement supérieur dans les transformations sociétales liées à l’IA ?

Comme pour toute transformation sociétale : l'étudier, avant d'y plonger bille en tête.

Quelles sont vos principales inquiétudes concernant le développement de l’IA dans la formation ?

Encore une fois :

les outils d'IA générative n'ont pas fait les preuves de leur utilité dans les formations ;
on cherche à remplacer les enseignants / réduire les moyens mis dans la formation en utiilsant des outils qui ne fonctionnent pas ;
quand bien même ces outils fonctionneraient, leur coût écologique, éthique et géopolitique est trop élevé.

Selon vous, que manque-t-il aujourd’hui pour développer l’IA dans l’enseignement supérieur ?

En supposant une fois de plus que la question se réfère non pas à enseigner l'IA et les disciplines liées, mais à utiliser pour enseigner des outils basés sur des modèles génératifs : des outils fiables, dont l'utilité pédagogique soit prouvée, qui ne posent pas de problèmes écologiques, éthiques, ou géopolitiques.

Dans l’hypothèse d’une omniprésence des IA dans 5 ans, comment concevez-vous les établissements d'enseignement supérieur (formation, gestion…) ?

Je ne suis pas sûre de ce qui est entendu par « omniprésence des IA », mais je ne vois pas en quoi les missions des établissements d'enseignement supérieur seraient affectées. Nous devrons en particulier continuer de former l'esprit critique de nos étudiantes et étudiants, et à leur donner la formation nécessaire à comprendre ces outils au-delà des effets de mode et d'un imaginaire superficiel. S'il s'agit d'imaginer que les outils type ChatGPT soient fortement intégrés à tous les aspects de la formation, indépendamment de leur utilité/nocivité, je ne peux qu'entrevoir un impact négatif sur les établissements d'enseignement supérieur, dont la qualité des formations se trouverait fortement dégradée.

Selon vous, quelles sont les premières mesures à mettre en œuvre concernant l’IA dans les établissements de l’enseignement supérieur ?

Mener une réflexion sur comment faire évoluer nos enseignements étant donné que ces outils existent et que nos étudiantes et étudiants sont susceptibles de s'en servir (de manière intense et systématique). Il s'agit d'une part de leur permettre de comprendre comment ces outils fonctionnent, leurs limites et les nombreux problèmes que leur utilisation pose, et d'autre part de changer nos enseignements et évaluations pour prendre en compte ces pratiques et notamment continuer d'évaluer les étudiantes et étudiants et pas ChatGPT.

Quelle est votre opinion sur le développement et l'intégration de l'IA dans les formations ?

Au vu de la place que l'IA prend dans nos sociétés, il est essentiel que nos formations forment à l'IA : c'est-à-dire à savoir, techniquement, comment fonctionnent les outils qui en relèvent, et socialement, quels sont les problèmes juridiques, éthiques, sociaux, environnementaux que leur utilisation soulève.

Par contre, intégrer des outils d'IA générative à nos pratiques pédagogiques n'est ni fait ni à faire. Dans l'état actuel de l'art, l'immense majorité de ces outils ne répondent à aucun besoin pédagogique, et leur utilisation soulèverait par contre de nombreux problèmes (manque de fiabilité des outils, nocivité contre-productive de leur utilisation en contexte éducatif, nombreuses barrières éthiques). Il est essentiel de prendre le temps d'évaluer l'efficacité pédagogique de telles pratiques, leurs implications éthiques, et plus largement leur impact sur nos sociétés (Giannakos et al. 2024 doi:10.1080/0144929X.2024.2394886).

2023-01-27

Message de Stanislas Guérini via la DGFiP et droits Informatique et Liberté

Pour les agents de la fonction publique ayant reçu un message non sollicité de Stanislas Guérini sur l'adresse email liée à leur compte ENSAP, et ayant constaté que la mention « Si vous ne souhaitez plus recevoir ce type de courriel, merci de vous désabonner dans votre espace Particulier, rubrique « Gérer mon profil » sur impots.gouv.fr » en bas dudit message est doublement fallacieuse, d'une part car il n'y a pas de telle possibilité dans la gestion de profil sur impots.gouv.fr, et d'autre part car l'adresse email utilisée n'est pas celle liée au compte impots.gouv.fr, il est possible de faire exercer vos droits Informatique et Libertés en suivant les indications de la CNIL.

J'ai personnellement écrit à la DPO de la DGFiP à donnees-personnelles-mes-droits@dgfip.finances.gouv.fr afin de demander la suppression de mes coordonnées de leur fichier d'envoi de publicités, et saisi la CNIL en parallèle ici.

Voici (sans aucune recommandation juridique) le texte de mes courriers :

Email à donnees-personnelles-mes-droits@dgfip.finances.gouv.fr :

Objet : Opposition à recevoir de la publicité

Texte : Madame, Monsieur,

J'ai reçu aujourd'hui 27 janvier 2023 un message de la Direction Générale des Finances Publiques <ne-pas-repondre@dgfip.finances.gouv.fr> dont l'objet est « Réforme des retraites : Message de Stanislas GUERINI », adressé aux agents de la Fonction publique.

Il s'agit d'un message non sollicité de communication politique sur un projet de loi dont la discussion n'a pas débuté au parlement et qui n'a aucun lien avec mon activité professionnelle.

En bas de ce message il est mentionné « Si vous ne souhaitez plus recevoir ce type de courriel, merci de vous désabonner dans votre espace Particulier, rubrique « Gérer mon profil » sur impots.gouv.fr ». Ceci est manifestement faux, pour deux raisons :

- d'une part il n'y a aucun réglage dans la rubrique mentionnée correspondant à l'envoi par courriel de ce type de communication ;

- d'autre part l'adresse email liée à mon compte sur impots.gouv.fr n'est pas celle à laquelle j'ai reçu ce message.

Conformément aux dispositions de l’article 21.2 du RGPD, je vous remercie de bien vouloir supprimer mes coordonnées de vos fichiers d’envoi de publicités.

Je vous rappelle que vous disposez d’un délai maximal d’un mois suivant la réception de ce courrier pour répondre à ma demande, conformément à l’article 12.3 du RGPD.

Je vous prie d’agréer, Madame, Monsieur, l’expression de mes salutations distinguées.

Texte de la plainte à la CNIL :

Madame, Monsieur,

J'ai reçu aujourd'hui 27 janvier 2023 sur mon adresse email personnelle un message de la Direction Générale des Finances Publiques (DGFiP) <ne-pas-repondre@dgfip.finances.gouv.fr> dont l'objet est « Réforme des retraites : Message de Stanislas GUERINI », adressé aux agents de la Fonction publique (voir PJ).

Il s'agit d'un message non sollicité de communication politique sur un projet de loi dont la discussion n'a pas débuté au parlement et qui n'a aucun lien avec mon activité professionnelle.

En bas de ce message il est mentionné « Si vous ne souhaitez plus recevoir ce type de courriel, merci de vous désabonner dans votre espace Particulier, rubrique « Gérer mon profil » sur impots.gouv.fr ». Ceci est manifestement faux, pour deux raisons :

- d'une part il n'y a aucun réglage dans la rubrique mentionnée correspondant à l'envoi par courriel de ce type de communication ;

- d'autre part l'adresse email liée à mon compte sur impots.gouv.fr n'est pas celle à laquelle j'ai reçu ce message.

Selon France TV Info, ce message a été reçu par plus de deux millions de fonctionnaires : https://www.francetvinfo.fr/replay-radio/le-brief-politique/info-franceinfo-retraites-quand-le-gouvernement-tente-de-convaincre-les-fonctionnaires-dans-une-video_5579064.html

Je saisis aujourd'hui la CNIL afin de signaler d'une part l'utilisation abusive d'adresses email personnelles par un organisme gouvernemental, et d'autre part l'impossibilité de s'opposer à cette utilisation en suivant les instructions communiquées en bas de message.

Parallèlement, et conformément aux dispositions de l’article 21.2 du RGPD, j'ai contacté la DGFiP à l'adresse email <donnees-personnelles-mes-droits@dgfip.finances.gouv.fr> pour demander à ce que mes coordonnées soient supprimées de leurs fichiers d’envoi de publicités (voir PJ).

2022-03-14

Some work practices (2022 update)

This is a list of tools and working practices I am trying to encourage in myself and people working under my supervision.

2021-05-13

Un autre numérique reste possible

This post is in French, as it is a copy of a text written in French with other French researchers and informed by our experience of the French system.

Miroir du texte publié en CC-BY-4.0 sur Medium

Signataires :

Chloé-Agathe Azencott, Enseignante-chercheuse en mathématiques appliquées à MINES ParisTech

Anne Baillot, Professeure des Universités à l’Université du Mans, Etudes germaniques et Humanités numériques

Frédéric Clavert, Professeur assistant en histoire contemporaine, C2DH, université du Luxembourg

Alix Deleporte, Maître de Conférences, Institut Mathématique d’Orsay, Université Paris-Saclay

Julie Giovacchini, Ingénieur de recherche en Analyse de sources anciennes et Humanités numériques, CNRS, Centre Jean Pépin (UMR8230)

Anne Grand d’Esnon, Doctorante en littérature comparée, Université Bourgogne-Franche-Comté

Catherine Psilakis, Université de Lyon 1

Une dystopie numérique universitaire se dessine, dont l'émergence est accélérée par la crise sanitaire. Il est néanmoins toujours temps de faire du numérique à l'université un outil au service des enseignant·e·s-cherch.eur·euses·s, ingénieur·e·s et étudiant·e·s – et plus largement de toutes celles et ceux qui enseignent, cherchent et transmettent le fruit de leur recherche.

2018-05-17

Machine learning approaches to disease prediction

I've had the great pleasure to spend a few days in Copenhagen attending, first, a symposium on Big Data approaches to health, disease and treatment trajectories, and second, a two-day workshop on machine learning approaches to disease prediction.

The workshop, organized by Rikke Linnemann Nielsen, Agnes Martine Nielsen and Ramneek Gupta, had around 40 attendees, and featured Jason Moore, Marylyn Ritchie, Andrea Califano, Laurent Gautier and myself as invited speakers.

There was a lot of time built in for discussion, and I wanted to summarize here some of the points that were raised because I think they can be very useful.

Understanding the machine learning algorithms you use is key. In particular, run simulations, and check whether the algorithm / implementation you are using behaves as you expect it on them. Yes, this is boring, but essential, as however else are you going to trust that it's the right tool for your problem? Marylyn drove that point very well.

No algorithm is going to solve all your problems. It's not because a method worked beautifully on a paper you've read that it's going to be good for your problem, and certainly not with the default parameters. In my own words, there's this little thing called the no free lunch theorem.

Some algorithms are what Jason refers to as frozen accidents. Someone had an idea, tried it on some data, got it published, and then for the following twenty year the entire community believes the way to treat vaguely similar data is with that idea and nothing else. Challenge that. (You'll still probably need to use the frozen accident in your paper, but maybe you can also present an alternative that's better for your problem.)

Please take a step back and think before using t-SNE. What are you trying to do exactly? Remember, t-SNE is a dimensionality reduction tool, not a clustering algorithm. You can use it for visualization. What do you think of your visualization before DBSCAN or any other clustering algorithm colors the points in different clusters? What happens to it if you change the perplexity? Remove a couple of samples?

I keep repeating this, and I will repeat it again: you do not evaluate your model on the data you've used to train it. Feature selection is part of training. Model selection is part of training. If your model changes when you add an observation to your evaluation set, it means that your validation is not evaluating generalization properly. Clean validation requires holding out a data set that you only use for evaluating performance after having selected features, tweaked hyperparameters, and so on and so forth. DREAM challenges are good for this, by the way.

Choosing the right evaluation criterion (or criteria) is crucial. Area under the ROC curve may well not be informative for a problem with a high class imbalance, for instance.

Electronic health records are bringing new machine learning challenges that seem far from solved. How do you deal with time series data where each sample has its own time points? How do you deal with heterogeneous data types? How do you deal with sloppy data? How do you deal with missing data that can be missing for very different reasons?

About missing data, we've spent a lot of time discussing imputation, and we're not big fans. It seems like a great way to introduce more noise and biases in data that already has more than its share of it. In addition, data from EHR can be missing from very different reasons. Did the patient not get that blood work done because the medical doctor did not prescribe it? Because the patient hates needles? Because she cannot afford the test? Or were the results just not entered in her record?

If your goal is to do translational research, you need to understand what clinicians need. On the symposium on Tuesday, Thorkild Sørensen made the excellent point that the only thing clinicians care about is to improve patient care. What is a good measure of clinical utility for your problem? A simple, interpretable model may not perform as well as a deep boosted random kernel forest – I'm expecting royalties if you're actually creating that algorithm, by the way –, but it may still perform better than the current tools. Also, what does interpretability means for them? Is this needed for this particular problem?

About p-values, remember that statistics are not biology. If we can agree that all what we're doing with our computers is to generate hypotheses (rather than biological knowledge), there's no clear evidence that a p-value is a more meaningful score than a random forest feature importance score, a regression weight or whatever else you want to compute. On the other hand, p-values are a good tool to compare what you're doing with random chance, and you can construct a null for about anything by permutation testing.

Finally, we also talked a lot about negative results, which for us are mainly the whole bunch of methods we tried to apply to our data and that led us nowhere. There was a large consensus that those are science, and they're interesting to the community, and they should be published. There is also a general agreement that publishing them is not easy, and that you cannot get a PhD / faculty position / grant based on these type of results only. Sadly.

Oh, and here are my Tuesday slides on network-guided sparsity and my Wednesday slides on multitask approaches.

one comment

2016-06-20

Some work practices

This is a list of tools and working practices I am trying to develop for myself and people working under my supervision. They are not set in stone and are meant to evolve according to the people and project.

2016-03-17

Local user installation of gcc

On our compute cluster, I needed gcc-4.8.4 to compile some code. At the global level, gcc-4.4.7 is installed, and I do not have superuser privileges on the system (which is, all things considered, a good thing).

Here are my notes on how I installed gcc-4.8.4 locally, without superuser privileges, in case they might one day be of use to someone...

2015-04-30

Numbering PDF Pages With pdftk

I've recently been putting together applications (to ask for funding for PhD students) that required me to create single PDF files containing various documents (some I had produced myself, some I had scanned). I knew how to do this using gs or pdftk, but then I found myself with large documents (around 30 pages) and realized page numbers would be really helpful. "I'm sure there's a way to do this with pdftk," I thought to myself. And sure enough, there was.

So here are the magic commands.

2015-04-17

PyData Paris 2015

I attended PyData Paris at Telecom ParisTech on April 3, 2015. It was a great experience! I realized I missed interacting with more hard core programmers.

The notes I took on Twitter are storified here.

I also gave a (too) short presentation on using Python for the DREAM challenges. My slides are here and the video of my talk is on Youtube.

2015-03-27

Beware of circularity: Evaluating SNV deleteriousness prediction tools

If you're working with next-generation sequencing (NGS) human data, chances are at some point you will be interested in automatically determining which of your sequence variants are more likely to have deleterious effects. A first step is often to focus on missense single nucleotide variants (SNVs), i.e. substitutions of a single nucleotide that result in a different amino acid. Indeed those are disproportionately deleterious compared to other variants [MacArthur et al., 2012]. In addition, you can filter out common variants, which are presumably less likely to be deleterious. But that's still a lot of variants to contend with, and that's where SNV deleteriousness prediction comes into play.

There are many tools (see this list at OMICtools) that are dedicated to the problem of predicting whether a missense SNV is deleterious (a.k.a. pathogenic, damaging, disease-causing) or neutral (a.k.a. tolerated, benign, non-damaging). Some, such as SIFT, are based on sequence conservation, under the premise that disrupting a highly conserved sequenced will be more damaging. Others, like PolyPhen-2, try to assess the effect of amino acid changes on protein structures. CADD mixes several types of genomic information. And a few tools, such as Condel, combine the outputs of other tools.

Back in 2012, we set out with the following question: given the simplicity of current prediction methods (compared to the complex machine learning models that we are usually manipulating), couldn't we come up with better annotation tools than what was out there? We started toying around with a few ideas, and soon enough had to wonder how exactly to validate the methods we were proposing. So we started investigating the state of the art and benchmark data sets in more details... and we fell down the rabbit hole.

Our story, which we just published in Human Mutation, is in essence very simple. It boils down to one of the basic commandments of machine learning: Thou shall not test on your training set, meaning that if you evaluate your prediction tool on the same data that was used to build it, you'll have no idea whether it's any good on new data or not (a phenomenon typically referred to as overfitting). To take an extreme example, if your algorithm looks up in a file the hard-coded values it should return, it will perform perfectly on the variants that are in this file, and be utterly unable to make predictions for other variants (which, presumably, is the interesting part).

Put like that it sounds rather obvious. However, the community has pushed itself in a corner where it's becoming really difficult — if not downright impossible — to properly compare deleteriousness prediction tools.

The first reason is that the publicly available benchmark data sets typically used for evaluating tools overlap with the databases used to build some of these tools. Others have pointed this out before us, and endeavored to develop independent benchmark data sets [Nair and Vihinen, 2013]. However, there can still be some overlaps (mainly in neutral variants). Furthermore, not all authors disclose the variants they used to build their tools. It is impossible to guarantee that an evaluation data set does not contain some of these variants, and hence to guarantee fairness when comparing these tools against others.

The second reason is more subtle. It turns out that, in an overwhelming majority of cases, when one variant of a gene is annotated, all other variants of that gene that are available in the database also have the same annotation. This is due to the way these data sets are put together and does not necessarily reflect biological reality. However this means that you can very efficiently leverage the annotation of other SNVs in the same gene to build what will appear to be a very accurate tool; but there is no guarantee that this tool will perform well on new variants. The evaluation of such tools (e.g. FatHMM in its weighted version, as well as tool combinations such as the latest version of Condel) is heavily biased by this phenomenon.

Our paper demonstrates the negative effects of these two types of circularity (which we're calling that way because they result in relying on (somewhat) circular reasoning to draw conclusions about the performance of the tools). Actually, the pervasiveness of these effects is such that we found it impossible to draw any definite conclusion on which of the twelve tools we tested outperforms the others. Note that in most of the cases where we can measure performance without these biases, we obtain accuracies that are significantly worse than usually reported.

So how can we move forward? In our opinion, releasing not only the data that were used for training the tools, but also the precise descriptors and algorithms used by each tool would be the best way to get out of this quandary: anyone could perform stratified cross-validations, and determine the best algorithm to be trained on the union of all available data, resulting in the best possible tool.

At the very least, authors should release which variants are in their data (even if they don't release their annotations), so that others can avoid circularity when comparing new methods to theirs. They should also abstract themselves as best as possible from the second type of circularity we described. For this purpose, we recommend reporting accuracies for varying values of the relative proportions of pathogenic and neutral variants in the gene to which the SNV belongs.

There are a few more questions that remain open.

Which transcript should be used when a tool requires features of the gene in which the SNP appeared? Others have used the transcript yielding the most deleterious score. In order to use the same transcript for each tool, we settled on the canonical transcript. The results we report weren't much affected by this choice, but I think it is a question worth considering.

More importantly, what do "deleterious", or "pathogenic", or "damaging" mean exactly? Different authors have different definitions, meaning that not all these tools set out to address exactly the same problem. How can you then compare them? Along those lines, we should also systematically disclose the source of evidence for annotations in the benchmark data sets (as is generally done, for example, in gene function prediction). Indeed it is possible that some annotations come themselves from tool predictions, hereby artificially inflating the apparent performance of these tools.

Finally, the whole field relies on the premise that some mutations are inherently more damaging than others, but I am expecting a lot of other factors, such as other variants, all sorts of environmental or clinical variables, and the specific disease you're interested in, to come into play. The fact that we report better-than-random performance shows there is some validity in this assumption, but how far can we really get? What is the best accuracy we can reach? And, given the rate at which we are accumulating knowledge about all possible missense SNVs, how long will it take before we have annotated all of them experimentally and do not require any predictive algorithm any more?

You can read the full story and find all the data and the Python scripts we used at Human Mutation: Dominik G. Grimm, Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G. MacArhur, Kaitlin E. Samocha, David N. Cooper, Peter D. Stenson, Mark J. Daly, Jordan W. Smoller, Laramie E. Duncan, Karsten M. Borgwardt. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity, Human Mutation, 2015 doi: 10.1002/humu.22768

Disclaimer: this blog note has been written by myself alone and reflects my personal take on this work and on the domain, which is not necessarily that of all co-authors.

2015-03-17

En direct du labo

J'ai eu le plaisir la semaine dernière de partager un peu ma vie de chercheuse sur le compte Twitter @endirectdulabo, sur lequel chaque semaine, un scientifique différent partage son quotidien et ses recherches. J'ai ainsi pu parler de génomique, de machine learning, de big data, de recherche médicale ou de la mise au point de nouveaux médicaments, mais aussi de la vie du labo et du monde académique (femmes en informatique, la vie après la thèse, comparaisons avec l'Allemagne et les États-Unis, etc.). Le tout a été archivé par les petites mains de l'équipe sur Storify !

2014-08-15

Making posters with Scribus

A few notes on my poster-making workflow as I am getting ready for the ICPR Workshop on Features and Structures (FEAST) 2014. When in Tübingen I was using LaTeX and the template of the MLCB group. However I recently decided to come back to the ways of my graduate student days and use the desktop publishing application Scribus. Scribus is much more flexible in terms of layout and design; it is open source, works cross-platform, and makes it possible to integrate LaTeX equations.

I am certainly using only a small fraction of the possibilities of Scribus. However, here's my workflow:

2013-08-13

Dotclear turns 10

Just a shout out for Dotclear, the little blog engine that could! I've been a happy Dotclear user since 2007, and it was a natural choice to power this website when I started it last year. I am very happy to see this project, powered by a small, open-source, French community, finding its second wind in time for its 10th anniversary.

Happy birthday, Dotclear, and here's to the next ten years!

2013-07-07

Notes from JOBIM 2013

JOBIM (Journées Ouvertes Biologie, Informatique et Mathématiques) is the French conference in bioinformatics. I attended the 14th edition in Toulouse, thanks to a travel fellowship from the conference that made it possible for me to travel from Germany and give a talk on network-guided multi-locus association mapping with graph cuts ([slides]).

2013-06-21

Installing PyGTK on Mac OS X 10.7

This is how I installed PyGTK on my office machine, a Mac with OS X 10.7.5 (aka Lion) on which I've never managed to properly use fink or macports and gave up trying to install homebrew. In other words, without a package manager. Look, mommy, no hands!

I'm putting those notes here in case it might help someone struggling with similar problems. Please try using a package manager first, and save yourself some headache.

2013-03-15

Spring Travels

I will be away from Tübingen in the next three weeks, attending SMILE in Paris on Monday, March 18th, as well as the Workshop in Computation, Inference and Optimization at IHÉS on Wednesday, March 20th in Bures-sur-Yvette (France).

I'm also looking very much forward to visiting EBI Cambridge at the end of the month. I will be giving a talk on network-guided multi-locus genome-wide mapping on Tuesday, March 27th at 11am.

2013-02-17

Data Mining in Bioinformatics Course

I will be teaching a few of the lectures in the course "Data Mining in der Bioinformatik" from February 18 to March 1st.

Lecture slides:

Day 5: Frequent subgraph mining
Day 7: Clustering in bioinformatics: clustering gene expression data
Day 9: Graph mining for chemoinformatics and drug discovery

2012-11-29

NIPS 2012

I'll be attending NIPS next week, and am very much looking forward to what promises to be a great scientific week.

I will also be presenting a poster on my first results in graph-based feature selection^[1] at the Machine Learning in Computational Biology workshop on December 7. I've been working with Dominik Grimm, Yoshinobu Kawahara and Karsten Borgwardt on the problem of finding single-point mutations that are maximally, jointly associated with an observed trait, while being connected in an underlying (predefined) biological network. We've been rather successful at dealing with the large (10^5 to 10^7) number of features involved, as in our experiments the method turns out to be fast, robust, and generally lead to better recall than our state-of-the-art comparison partner, the overlapping group lasso, for very similar precisions.

The method is currently called SOS for Subnetworks of Optimal SNPs, but I'm not very happy with the name and I'm considering renaming it SConES (Selection of Connected Explanatory SNPs).

Note

[1] Although I have a lot of experience treating problems in which the objects themselves are represented by graphs (and the way they are connected is very much object dependent), I had never studied a setting in which the objects are not graph-like, but there is an underlying network that connecte their features (completely independently of the objects).

2012-07-13

Lindau Nobel Laureates Meeting 2012

I was last week one of the lucky ("has qualified in a global competition among young scientists worldwide to participate", my certificate of attendance says emphatically and somewhat repetitively) six hundred or so young researchers participating in the Nobel laureates meeting in Lindau. Twenty-seven laureates were attending this sixty-second edition focused on physics. Physics, as you may know, is something I've stopped studying as soon as I was allowed to, and quite frankly I was a bit afraid that I wouldn't understand anything. Thankfully, most of the conversations remained pretty high-level and I was able to follow quite more than I was expecting.

What a week! It was an exciting place to be, especially the week when the CERN (the idea of which apparently arose at a similar meeting in Lindau) announced having found a (possibly Higgs) boson (you can watch the reactions of a few of the laureates here). It was an inspiring meeting, what with the great conversation with a bunch of damn smart people (and I don't only mean the laureates), memorable talks, and an impressive amount of social occasions... I've never been (and will probably never again be) so well treated at a conference. There were, of course, quite a good amount of welcome addresses and other inaugural talks, and a few uncomfortable moments — such as the induction of the new members to the honorary senate of the foundation organizing the meetings: the President of Singapore, a country that might be doing cool science but which democracy remains questionable at best, and the CEO of Volkswagen, introduced by a video that was to my eyes nothing but a commercial for his company.

Lindau Hafen

Lindau harbor

Great memories

I came back with many great memories... where to begin? Explaining support vector machines to Brian Schmidt (2011 Physics Nobel laureate, for the discovery that the universe's rate of expansion is accelerating) is certainly one of my favorites. Another one, far from scientific concerns, would be the few words I exchanged with Hans-Peter Ochsenhofer, violist of the Vienna Philharmonic, and member of the ensemble that played a Haydn string quartet and a Mozart clarinet quintet at the concert following the opening ceremony. Countess Bettina, daughter of Count Lennart who initiated those meetings and President of the Council for the Lindau Nobel Laureate Meetings, did after all insist in her welcome address on the similarities between art and science, which both open minds, breech frontiers, and involve creativity.

Just before the Nobel laureate meeting started, the research ministers of the G8+5 countries were meeting in Constance to discuss international research policy; a few of them took part in a boat tour on the lake to which a few dozens of us young researchers were invited as well, together of course with the laureates who had already arrived. I talked with Geneviève Fioraso (the French Minister of Research and Education), mostly about the lack of consideration for PhD holders in France, as well as with John Holdren (chief scientific advisor to President Obama), who had very interesting things to say about deciding of a science budget and trying to convince politicians that scientists must take risks (and fail often) for real advances to be made.

I also attended a dinner hosted by the Max Planck Society (my academic sponsor), at which I sat close to Theodor Hänsch (Physics Nobel laureate 2005, for the development of laser-based precision spectroscopy), and I have fond memories of the conversations, from his eagerness to discuss science to his memories of California. Sadly, Hänsch did confirm what I'm hearing way too often these days: that your postdoc is the best time of your career...

Conversations highlights

The usual themes (science, academia vs. industry, balancing science and the rest of one's life, women in STEM, teaching, publish or perish, the two-body problem, navigating different cultures, and pretty much anything else under the sun, including, of course, soccer) were very present in the conversations among young researchers (so it turns out that computer scientists aren't that different from physicists after all).

Atomkraft protest

A protest against nuclear energy in Lindau — a great kick off to a week filled with many discussios on energy!

In what follows I'll try to briefly summarize my favorite talks and discussions. The talks should be available in the Mediatheque (which, unfortunately, requires Silverlight to run, and doesn't seem to work with Moonlight on my Ubuntu laptop — yes, I have complained about it already. At least it works on my Mac.).

Monday Science Breakfast: On the Brink of an Era of Quantum Technologies With Colin Teo (PhD Student at NUS), Artur Ekert (Director of the Centre for Quantum Technologies at NUS), and Bill Phillips (1997 Physics Nobel Laureate for his work on cooling and trapping atoms with lasers, and NIST/JQI fellow). A discussion on quantum cryptography and the necessity to close the loopholes (detection, locality, and free will) in Bell tests experiments. As much as I love the free will loophole (is there an entity able to fool us into believing that we observe a violation of Bells inequalities?), someone (Phillips, maybe?) made an excellent point by asking which of human factors or quantum mechanics loopholes is more likely to compromise quantum cryptography. Eckert admitted that the idea that a property truly does not exist until measured is confusing as hell. I don't know whether I should find this comforting or not, but I'm glad it's not just laypeople like me who have a hard time wrapping their mind around this concept. (Same goes for the temperature of a single atom, which even as someone used to manipulating probability distributions I find disturbing.) This was followed by a conversation on Qubits. Phillips stated that in his opinion there's a 50-50 chance we'll have a quantum computer able to factorize numbers in polynomial time 50 years from now. He suspects the Qubit will be a combination of quantum objects.

Monday morning, Part I: Cosmology Three plenary lectures by Brian Schimdt, John Mather and George Smoot (both 2006 Physics Nobel laureates for the discovery of the anisotropy of the cosmic microwave background, consistent with the Big Bang model).

Schmidt condensed an entire series of lecture in 30 minutes about the standard model; Mather gave us a long list of the telescopes that allow us to see farther and farther; and Smoot discussed the mapping of the Universe in both time and space. To be honest I mostly remember the very pretty pictures and the excerpt from Contact that Smoot showed. Well, taking pretty pictures of our universe is a large part of what cosmologists do, right?

Monday morning, Part II: Climate Change, Global Warming, and Energy Four plenary lectures by Paul Crutzen, Mario Molina (both 1995 Chemistry Nobel laureate for the discovery of the role of nitrogen oxide in the ozone hole), Ivar Giaever (1973 Physics Nobel laureate for his work tunneling in supraconductors) and Hartmut Michel (1988 Chemistry Nobel laureate for his work on membrane proteins involved on photosynthesis).

While Crutzen gave a very sobering picture of the Anthropocene, this geological age we have entered that is influenced by mankind, and concluded that "a daunting task lies ahead for scientists and engineers", Giaever defended the exact opposite view, took offense at the American Physics Society's statement that evidence for global warming is incontrovertible ("unlike," he added, "the mass of the proton"), and delivered a provocative lecture about the alarmist fallacies of global warming. Unfortunately, this address was, as far as I could tell, peppered with inaccuracies and gross generalizations; self-admittedly, it stems from a few hours spent on Google rather than a careful review of the literature. Here's a pretty good deconstruction.

Molina, who preemptively dismissed the position defended by Giaever in a famous Wall Street Journal editorial, presented some scientific evidence of climate change (in particular of the increasing incidence of extreme climatic events), discussed possible strategies to reduce the emission of CO2 (including a new generation of more secure nuclear power plants with better waste management), and addressed some political issues, with an emphasis on the regrettable position of U.S. Republicans.

Eventually, Michel closed the morning session with a lecture on photosynthesis, biomass and biofuels, essentially demonstrating how inefficient they are (natural photosynthesis, which we are far from being able to approximate, included). He also spent some time addressing the production of palm oil biofuels, which he deemed to be "one of the most stupid things" (for it destroys the rain forest and actually leads to the emission of more CO2). His vision for the future includes genetic engineering of plants to improve their CO2 fixation and light absorption capabilities, the improved transport of solar energy produced in deserts (using superconducting cables), and the development of better batteries to improve electricity storage.

Monday afternoon discussion with Mario Molina Lots and lots of ideas in that conversation! Here are a few of the topics we touched: climate change as a symbol of government intervention for the Tea Party, the distrust of scientists (Molina pointed out that people also distrust Al Gore because he's not a scientist), developing countries (good solutions can accelerate development rather than slow it down), geo-engineering (there's nothing wrong with having a plan B but the community should be very, very cautious), nuclear waste (which biggest problem is known as "NIMB", or Not In My Backyard). Molina talked a lot about rephrasing the problem to overcome political barriers: working with the media (which is what so-called interest groups are doing), mentioning air quality improvement, and presenting optimistic solutions. For instance, in response to the ever increasing human population, he prefers addressing the improvement of the standard of living and women empowerment rather than advocating (as does Giaever) a one children per family policy.

When asked how to form one's own opinion in scientific controversies, Molina gave us two choices: either dwell in the literature yourself, or trust the way the peer-reviewed scientific community works... later on, he reminded us that there have been much more controversial scientific theories than climate change (Boltzmann in the early 20th century was fighting groups of scientists who did not believe that molecules exist).

Lindau Hafen

Lindau harbor

Tuesday morning session: Quantum Mechanics Six plenary lectures (and a coffee break) with Martinus Veltman (1999 Physics Nobel laureate for the elucidation of electroweak interactions), Carlo Rubbia (1984 Physics Nobel laureate for the discovery of W and Z particles), David Gross (2004 Physics Nobel laureate for the discovery of asymptotic freedom), Albert Fert (2007 Physics Nobel laureate for the discovery of giant magnetoresistance), Bill Phillips and Brian Josephson (1973 Physics Nobel laureate for the discovery of, well, the Josephson effect of a supercurrent through a tunnel barrier).

The topic could not have been more aptly chosen; while the breakfast room at my hotel was already bustling with the news of the incoming CERN conference, Veltman brandished a copy of a local paper, telling us that this was now where one could read the latest scientific news. "This week might later be known as Higgs week", he added, before starting his lecture which, being titled The LHC at CERN and the Higgs, was a pretty good introduction to the unfolding events. He concluded with a few words of the implications of a Higgs field on cosmology; either the universe started out curved the other way around and was "flattened out" when the Higgs field came, or there is no such thing as a Higgs field, or our understanding of gravity is wrong, he explained.

Rubbia followed with a lecture on neutrinos which he thinks must fill many of the gaps of our understanding of the standard model. His talk focused on the LSND anomaly and electron neutrino disappearance in Gallium experiments, and the work to explore them at Fermilab (ICARUS), CEA (Lucifer reactor), and Daya Bay.

Gross then presented a history of quantum mechanics, which he assesses to be 100 years old in average (as their starting point can be either chosen as 1900, Planck, or 1925, Heisenberg), and many pictures of the Solvay Conference. He contrasted the anguish of the early 20th century at the idea of the failure of classical physics (illustrated by this sentence by Lorentz: "The old theories have been showed to be powerless to pierce the darkness that surrounds us on all sides") and the current enthusiastic agreement that QM works, makes sense, and resists (for now) attempts to break it. He pointed out, however, that we still lack a unified theory of physics that encompasses string theory, QM, and spacetime, which makes it exciting times for young physicists, with still so many amazing questions to answer.

In a more practically-oriented lecture, Fert followed with a presentation of spintronics in modern ICT, focused mostly on STT-RAM, spin-transfer oscillators, spintronics on graphene and carbon nanotubes, and memristors for neuromorphic computing.

Eventually, Phillips delivered a rather neat lecture on artificial magnetic fields. Unfortunately my notes are pretty obscure at that point, so I won't say more than recommend you watch it if you're interested.

Whether or not Josephson's lecture fits under the "quantum mechanics" label I could not say; his talk about a unifying theory of physics and spirituality, based on the idea of a disconnect between mathematics and reality, featuring expressions such as "subtle biosphere", and partly relying on a dancing metaphor of what seemed to me very much like self-organizing multi-agent systems minus the maths, left me (and all I discussed it with) rather confused.

Tuesday afternoon discussion with Albert Fert Mostly, a conversation about the future of CMOS: memristors, spintronics on graphene for displays, spintronics on diamond for quantum computing, oxides (such as LSMO) for spintronics, spin-LEDs, ferroelectric tunnel junction, and STT-RAM vs. PC-RAM for non-volatile memory (apparently Samsung stopped their research on phase-change and is putting all their effort on STT-RAM).

Lindau from the lake

Lindau — lakeview

Wednesday morning session: Spectroscopy Six plenary lectures, by Kurt Wüthrich (2002 Physics Nobel laureate for the development of NMR spectroscopy), John Hall, Theodor Hänsch (both 2005 Physics Nobel laureates for their work on laser spectroscopy and the frequency comb), Douglas Osheroff (1996 Physics Nobel laureate for the discovery of the superfluidity of Helium 3), Roy Glauber (2005 Physics Nobel laureate for his work on the quantum theory of optical coherence) and James Cronin (1980 Physics Nobel laureate for the discovery of CP violations).

Wüthrich started with a rather energetic lecture on structural genomics — surprising the audience by taking his belt off to illustrate protein folding, a trick I might want to remember in the future. He emphasized the size of the gap between the large number of known protein sequences (14 million) and the much smaller number of known protein structures (75,000), not forgetting the abysmally smaller number of known protein functions. He made the case for the development of NMR and multi-dimensional NMR (ie. NMR repeated at incremental time points), which work on protein solutions, as a great alternative to X-ray crystallography (and anybody who has ever tried to make a protein crystallize can see why). Eventually he introduced TROSY, which doesn't have the traditional NMR drawbacks with respect to size and makes it possible to analyze large protein complexes, by these words: "We always talk about the relativity but Einstein also did important work," meaning the Stokes-Einstein equation on Brownian motion.

Hall then celebrated fifty years of laser, and marveled at the still growing excitement in the field. And yes, his slides, like those of Bill Phillips, were in Comic Sans.

He was followed by Hänsch and his lecture on laser spectroscopy. I strongly recommend you watch his amazing animation explaining frequency combs with a series of pendulums (which you can see starting at the 5th minute of this Youtube video if Silverlight is giving you trouble), which led the audience to applaud for the first time in the middle of a lecture. "How do you find something new?" Hänsch asked, and gave a fairly simple answer to that: either you look where no one has looked before, or you make more precise measurements — whenever you measure something to more decimal places than anybody before you, chances are you'll find something there.

The first half of the morning session was closed by Osheroff and his lecture on how science discoveries are made. Using examples from supraconductivity to cosmic microwave background radiation and, of course, the superfluidity of Helium 3, he gave the following advice: use the best instrumentation; don't reinvent the wheel; look into unexplored regions; failures are a hint to try something new; don't dismiss subtle unexplained behaviors; understand what it is that you are measuring; and rely on the scientific community. Piece o' pie! Osheroff also told us of running in the building at 2am after discovering BCS transition in liquid 3He, not finding anybody to share his results with, and calling his supervisor around 2:30am — about the same time he received a call from Stockholm 25 years later—, complete with a copy of his lab notebook from back then.

The coffee break followed, taken over by the live stream of the CERN press conference that was screened in the lecture hall. I think most of us would have preferred watching the scientific talk that was given before, but it was still a rather exciting moment! (By the way, the Higgs field as a crowd of journalist analogy comes from David Miller.)

Back to lectures, Glauber — "I'm well aware that this is not show business. But let me say this is certainly a hard act to follow." — gave a pretty interesting history of quantum mechanics through a list of apparent paradoxes of optics. Unfortunately he ran out of time and I was never able to figure out what the ghosts in his title (The Quantum Mechanics of Light: Interference, Entanglement — and Ghosts) were.

Cronin closed that intense morning session with a history of our understanding of cosmic rays, showing us how scientific theories evolve as we gather knowledge and experiments — "when you do physics for 50 years of your life, you realize it's not so easy to do".

Wednesday afternoon discussion with Kurt Wüthrich Well, we did talk a lot about NMR, X-ray crystallography, TROSY, and electron microscopy. A few bits of advice for physicists interested in biology: learn how to communicate with biologists; learn what the important problems are; your role is to develop techniques and tools — you need to figure out by yourself what for. Another piece of advice: studying publications for errors will give you some insights as to what to do next. Wüthrich is a big fan of regularly changing fields in your training (and I can't say I disagree with him on that point... we clearly have different views on open source software).

Lindauer Löwe

The Lion of Lindau

Thursday morning session Six plenary lectures by Dan Schechtman (2011 Chemistry Nobel laureate for his discovery of quasi-crystals), Dudley Hershbach (1986 Chemistry Nobel laureate, for the development of molecular beams to analyze the kinetics of chemical reactions), Erwin Neher (1991 Physiology/Medicine Nobel laureate for explaining the role of ion channels in cells), Robert Laughlin (1998 Physics Nobel laureate for the explanation of fractional quantum Hall effect), Walter Kohn (1998 Chemistry Nobel laureate for the development of density functional theory) and Sir Harold Kroto (1996 Chemistry Nobel laureate for the discovery of fullerenes, a.k.a. "the buckyball guy" in some circles).

I think my favorite of all the lectures was Schechtman's talk on his discovery of quasi-periodicity. Stunning science, complete with beautiful diffraction patterns and Fibonacci rabbits ("The greatest mathematician of all times, but Italians called him 'Blockhead'"), and the history that led from the initial disbelief of the community ("There are no quasi-crystals, but only quasi-scientists," said Linus Pauling) to his Nobel, both wrapped in Schechtman's talents as an orator, made for a wonderful lecture in my opinion. "A good scientist is a humble scientist," he said, addressing the redefinition of crystals by the International Union of Crystallography from "a regular repeating array of atoms" to "any solid having an essentially discrete diffraction pattern" (notice the vagueness of this "essentially"). "Choose something you like, become an expert at it, and you will have a wonderful career," Schechtman told us young researchers. I'll have this talk in mind next time I review a paper that presents itself as "paradigm shifting"!

The most acclaimed bit of advice to young researcher was however, without a doubt, Herschbach's closing advice of "Experiment!", very nicely illustrated by the eponymous Cole Porter song and dedicated to Schechtman. I do wonder however how many people in the audience realized at the time the song is from Nymph Errant and about a woman being advised to experiment with men... One sure thing: I've been humming "Experiment, make it your motto day and night, experiment, and it will take you to the light" to myself ever since. Herschbach started his lecture on ''Chemical Wizardry'' (an appealing title for this chemoinformatician) with a jab at MIT, for their logo with mens and manus looking away from each other instead of working together, then told us of three "molecular parables": the synthesis of palytoxin, the single biologically active form of a compound that has 5.10^21 isomers, akin to '"Beethoven writing a symphony"; how the lack of a single methyl group on a particular amino acid on the Y chromosome of a human fœtus results in an infertile female rather than a male, and how little estrogen and testosterone differ; and the synthesis of indigo, back in 1853, when chemists didn't know anything about atomic structure.

Neher followed with a lecture on the biophysics of neurotransmitter release (which I haven't found in the Mediatheque), preceded to match the spirit of previous lecturers by a history of neurotransmitter science from Galvani's experiments on frogs (which, to me, brought back long-forgotten memories of high school biology) to his discovery of sodium ion channel currents. He told us how ion channels opening and closing is linked to the perception of hot and cold (including heat detection in vampire bats) but also substances such as menthol or capsaicine (found in chili peppers), before moving on to his current research on synaptic plasticity.

After a very much needed coffee break (you know how you just end up getting less and less sleep at those events in spite of all your good intentions), Laughlin gave an energetic lecture on "powering the future" (not so incidentally the title of his book — "I have many hobbies, one of them writing books," he explained, before adding that writing books, with proper references of course, is often the best way to understand a topic). He insisted on the difference between "saving the Earth" and the issue of energy (which is all but a brief geological instant) and took us through a "science-fiction experiment" of imagining year 2200. Would people still drive cars, fly in airplanes and have light come up when pressing a switch? Most of the audience answered positively to these questions, without any regard for how this would be made possible. That the main reason for this answer boils down to "because we want to" demonstrates how political and economic the question is. "What happens if the lights go off?" he asked next, answering with the impeachment of Davis and the election of Schwarzenegger in California. Laughlin also made the case that ordinary fuels are physically optimal, comparing them to fat as a compact, efficient energy source, and pointed out that synthetic fuels are in his opinion the only way to keep airplanes flying.

Kohn then gave a never before seen lecture (also not in the Mediatheque at the time of this writing) on developing devices to compensate macular degeneration, a topic he has been working on for the past seven years, driven by his wife's affection by the illness. He described a software tool which presents the patient with a straight grid, which she will distort with the mouse until she sees it straight. The transformation can then be automatically applied to any image in a hand-held device akin to a magnifying glass.

Eventually Kroto gave the last lecture of the week ("three talks packed in one!"), about science education and the scientific method. He quoted Feynman, insisted on the importance of making use of our freedom to doubt, and to ask ourselves what we accept without evidence. From chilling pictures of the Creationism museum to one liners too numerous to list ("We introduced a new concept of TV debates: that participants should actually know something about the topic."), he kept his audience attention tight. His final advice to researchers? "Satisfy yourself, not your boss, and you'll do good." This was a great, enthusiastic talk, which I recommend even to not-scientists. Oh, and have a look at vega.org and GEOSET.

Thursday afternoon discussion with Erwin Neher We touched to a variety of topics, from vesicle recycling to man machine interfaces, the connectome, the universality of Ca++ as a messenger, patch clamps, free will, and neuromorphic chips. My favorite part was probably the conversation( initiated during this discussion and followed afterwards in the streets of Lindau) about simulating biologically realistic neural systems vs. developing artificial neural networks for optimal approximation of function.

Mainau Schmetterling

Butterfly on Mainau

Friday panel discussion: The Future of Energy Supply and Storage For the last scientific event of the meeting, we were all put on a huge boat to head to the island of Mainau where a panel discussion on the the future of energy supply and storage, moderated by Geoffrey Carr ( Science Editor at The Economist), and featuring Martin Keilhacker (German Physics Society), Carlo Rubbia, Georg Schütte (State Secretary of the German Ministry of Eduction and Research) and Robert Laughlin, was held. Taking in questions from the young researchers, they discussed the reality of phasing out nuclear energy in Germany, the role to be played by developing and third-world countries, safer nuclear energy, the difficulty of estimating the costs of various energy sources, and improving the yield of photovoltaic cells. Rubbia and Laughlin strongly disagreed on the future of superconductivity as a way to transport energy (Laughlin even qualifying the idea of stupid).

Mainau-Lindau

Bye-bye Mainau — The boat back to Lindau

Head to Flickr for more pictures of Lindau and Mainau (I only had my compact, so do not expect anything too impressive — of course there's always the #lnlm12 hashtag there as well).

Disclaimer — All view expressed my own (unless otherwise stated) and inaccuracies my sole responsibility.

2012-04-13

Upcoming Travels

Starting April 16, 2012 I will be away from Tübingen for six weeks, visiting the following people:

April 16 - May 4: Bertram Müller-Myhsok
Max Planck Institute of Psychiatry in Munich (Germany)

May 8 - May 14: Pierre Baldi
University of California Irvine, Irvine, CA (United States)

May 15 - May 18: Karsten Borgwardt and Mark Daly
Broad Institute & Massachusetts General Hospital, Boston, MA (United States)

May 21: Tijana Milenkovic
University of Notre-Dame, Notre-Dame, IN (United States)

May 24-24: Raphael Pelossof
Memorial Sloan-Kettering Cancer Center, New York, NY (United States)

Looking forward to both the science and the travels (especially as it will be my first time back in the U.S. since my PhD)!

\klo.e a.ɡat a.zɛn.kɔt\ (Klo-ay Ah-gat Ah-zen-kot) she/her Professor in machine learning for genomics at the Centre for Computational Biology (CBIO) of Mines Paris–PSL, Institut Curie and INSERM.

Email à donnees-personnelles-mes-droits@dgfip.finances.gouv.fr :

Texte de la plainte à la CNIL :

Note

\klo.e a.ɡat a.zɛn.kɔt\ (Klo-ay Ah-gat Ah-zen-kot)
she/her
Professor in machine learning for genomics at the Centre for Computational Biology (CBIO) of Mines Paris–PSL, Institut Curie and INSERM.