Research
Jump to Research interests | Funded projects | Publications
Research interests
I focus on the development of methods for feature selection in high-dimensional data. Essentially, my goal is to make sense of data with a small number of samples and a large number of variables. These variables can be clinical variables (such as age, cholesterol levels or smoking history), genetic variables (such as gene expression, mutations, or epigenetic markers), or describe electronic health records. How can we find out which of them play a role in a particular biological process or pathology? My work has numerous applications, in particular in precision medicine, where we try to develop treatments that are adapted to the (genetic) characteristics of patients, by contrast with a classical one-size-fits-all approach.
I am interested in the incorporation of additional (structured) information, for example as biological networks; in multi-task approaches, where one addresses multiple related problems simultaneously; and in the development of non-linear approaches to model interactions between variables. In terms of machine learning, a lot of my work is linked to structured sparsity. This has led for example to the development of SConES (Selecting CONected Explanatory SNPs), a method for network-guided multi-locus association mapping based on graph cuts. I am also very much interested in post-selection inference, which aims at obtaining valid p-values after a feature selection procedure.
I am also currently working on projects involving the analysis of various types of biological networks, the integration of different data types (multiview/multimodal learning), and the prediction of molecule-protein interactions.
Funded projects
Ongoing
- Advanced statistical machine learning methods for determining genotype-phenotype associations from genome-wide biobank data Collaboration with Janssen Research & Development, 2022 — 2024.
- STEVE: Advancing genotype to phenotype Studies by considering Transposable Elements Variability and Epivariability. ANR PRC, 2021 – 2025.
- PRAIRIE Chair.
Older
- MLFPM: Machine Learning Frontiers in Precision Medicine. H2020 Innovative Training Network, 2019 – 2023.
- SCAPHE: Methods for discovering SNP Combinations Associated with a PHEnotype from genomewide data. ANR JCJC, 2019 – 2022.
- Training distributed models. Collaboration with SANCARE, 2018 – 2020.
- Machine learning for genome-wide association studies. Collaboration with SANOFI, 2016 – 2019.
Publications
Jump to Textbook | Preprints | Peer-reviewed publications | Monographs | Conference abstracts
Textbook
- Chloé-Agathe Azencott (first edition 2018, second edition 2022). Introduction au Machine Learning. Collection InfoSup, Dunod [book's webpage].
Preprints
- Ndèye Maguette Mbaye, Michael Danzinger, Aullène Toussaint, Élise Dumas, Julien Guérin, Anne-Sophie Hamy-Petit, Fabien Reyal, Michal Rosen-Zvi and Chloé-Agathe Azencott (2024). Multimodal BEHRT: Transformers for Multimodal Electronic Health Records to predict breast cancer prognosis. [medrxiv]
- Marc Michel, Maryam Heidary, Anissa Mechri, Kévin Da Silva, Marine Gorse, Victoria Dixon, Klaus von Grafenstein, Caroline Hego, Aurore Rampanou, Constance Lamy, Maud Kamal, Christophe Le Tourneau, Mathieu Séné, Ivan Bièche, Cecile Reyes, David Gentien, Marc-Henri Stern, Olivier Lantz, Luc Cabel, Jean-Yves Pierga, François-Clément Bidard, Chloé-Agathe Azencott and Charlotte Proudhon (2024). Non-invasive multi-cancer diagnosis using DNA hypomethylation of LINE-1 retrotransposons. [medrxiv]
- Christophe Poulet, Ahmed Debit, Claire Josse, Guy Jerusalem, Chloé-Agathe Azencott, Vincent Bours and Kristel Van Steen (2023). Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery. [bioarxiv]
- Elise Dumas, Anne-Sophie Hamy, Sophie Houzard, Eva Hernandez, Aullène Toussaint, Julien Guerin, Laetitia Chanas, Victoire de Castelbajac, Mathilde Saint-Ghislain, Beatriz Grandal, Eric Daoud, Fabien Reyal and Chloé-Agathe Azencott (2022). EDEN : An Event DEtection Network for the annotation of Breast Cancer recurrences in administrative claims data. [arxiv]
- Eric Daoud, Anne-Sophie Hamy-Petit, Elise Dumas, Lidia Delrieu, Beatriz Grandal Rejo, Christine Le Bihan-Benjamin, Sophie Houzard, Philippe-Jean Bousquet, Judicaël Hotton, Aude-Marie Savoye, Christelle Jouannaud, Chloé-Agathe Azencott, Marc Lelarge and Fabien Reyal (2021). Disparities in accessibility to oncology care centers in France. [medrxiv]
- Héctor Climente-González and Chloé-Agathe Azencott (2021). martini: an R package for genome-wide association studies using SNP networks. [biorxiv]
Peer-reviewed Publications
- Gwenn Guichaoua, Philippe Pinel, Brice Hoffmann, Chloé-Agathe Azencott and Véronique Stoven (2024). Advancing drug-target interactions prediction: leveraging a large-scale dataset with a rapid and robust chemogenomic algorithm. [link] [biorxiv]
- Élise Dumas, Beatriz Grandal Rejo, Paul Gougis, Sophie Houzard, Judith Abécassis, Floriane Jochum, Benjamin Marande, Annabelle Ballesta, Elaine Del Nery, Thierry Dubois, Samar Alsafadi, Bernard Asselain, Aurélien Latouche, Marc Espie, Enora Laas, Florence Coussy, Clémentine Bouchez, Jean-Yves Pierga, Christine Le Bihan-Benjamin, Philippe-Jean Bousquet, Judicaël Hotton, Chloé-Agathe Azencott, Fabien Reyal, and Anne-Sophie Hamy (2024). Concomitant medication, comorbidity and survival in patients with breast cancer [link] + [companion website (Adrenaline)]
- Abdelkader Behdenna, Maximilien Colange, Julien Haziza, Aryo Gema, Guillaume Appé, Chloé-Agathe Azencott and Akpéli Nordor (2023). pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. [link] [biorxiv]
- Christophe Le Priol, Chloé-Agathe Azencott, and Xavier Gidrol (2023). Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression. [link]
- Héctor Climente-González, Chloé-Agathe Azencott, Makoto Yamada (2023). A network-guided protocol to discover susceptibility genes in genome-wide association studies using stability selection. [link]
- Charles Vesteghem, Weronika M. Szejniuk, Rasmus F. Brøndum, Ursula G. Falkmer, Chloé-Agathe Azencott, and Martin Bøgsted (2022). Dynamic risk prediction of 30-day mortality of patients with advanced lung cancer: Comparing 5 machine learning approaches. [link]
- Élise Dumas, Lucie Laot, Florence Coussy, Beatriz Grandal Rejo, Éric Daoud, Enora Laas, Amyn Kassara, Alena Majdling, Rayan Kabirian, Floriane Jochum, Paul Gougis, Sophie Michel, Sophie Houzard, Christine Le Bihan-Benjamin, Philippe-Jean Bousquet, Judicaël Hotton, Chloé-Agathe Azencott, Fabien Reyal, and Anne-Sophy Hamy (2022). The French Early Breast Cancer Cohort (FRESH): a resource for breast cancer research and evaluations of oncology practices based on the French National Healthcare System Database (SNDS). [link]
- Lotfi Slim, Hélène de Foucauld, Clément Chatelain and Chloé-Agathe Azencott (2022). A systematic analysis of gene-gene interaction in multiple sclerosis. [link]
- Diane Duroux, Héctor Climente-González, Chloé-Agathe Azencott and Kristel Van Steen (2022). Interpretable network-guided epistasis detection. [link]
- Asma Nouira and Chloé-Agathe Azencott (2022). Multitask group Lasso for Genome Wide Association Studies in diverse populations. [link] [biorxiv]
- Lotfi Slim, Clément Chatelain, and Chloé-Agathe Azencott (2022). Nonlinear post-selection inference for genome-wide association studies. [link] [biorxiv]
- Matthieu Najm, Chloé-Agathe Azencott, Benoît Playe, and Véronique Stoven (2021). Drug Target Identification with Machine Learning: How to Choose Negative Examples. [link] [biorxiv]
- Yue Jiao, Fabienne Lesueur, Chloé-Agathe Azencott, Maïté Laurent, Noura Mebirouk, Lilian Laborde, Juana Beauvallet, Marie-Gabrielle Dondon, Séverine Eon-Marchais, Anthony Laugé, GEMO Study Collaborators, GENEPSO Study Collaborators, Catherine Noguès, Nadine Andrieu, Dominique Stoppa-Lyonnet and Sandrine M. Caputo (2021). A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. [link]
- Veronica Tozzo, Chloé-Agathe Azencott, Samuele Fiorini, Emanuele Fava, Andrea Trucco, Annalisa Barla (2021). Where do we stand in regularization for life science studies? [link]
- Héctor Climente-González, Christine Lonjou, Fabienne Lesueur, GENESIS Study collaborators, Dominique Stoppa-Lyonnet, Nadine Andrieu, Chloé-Agathe Azencott (2021). Boosting GWAS using biological networks: A study on susceptibility to familial breast cancer [link]
- Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, Jean-Philippe Vert (2020). Novel methods for epistasis detection in genome-wide association studies. [link]
- Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, and Jean-Philippe Vert (2019). kernelPSI: a post-selection inference framework for nonlinear variable selection. [link]
- Héctor Climente-González, Chloé-Agathe Azencott, Samuel Kaski and Makoto Yamada (2019). Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. [link]
- Benoît Playe, Chloé-Agathe Azencott, and Véronique Stoven (2018). Efficient multi-task chemogenomics for drug specificity prediction. [link]
- Chloé-Agathe Azencott (2018). Machine learning and genomics: precision medicine versus patient privacy. [link] [arXiv]
- Chloé-Agathe Azencott, Tero Aittokallio, Sushmita Roy, Thea Norman, Stephen Friend, Gustavo Stolovitzky, Anna Goldenberg, and DREAM Idea Challenge Consortium (2017). The inconvenience of data of convenience: computational research beyond post-mortem analyses. [link] [SharedIt] [pdf]
- Solveig K. Sieberts, Fan Zhu, Javier García-García, Eli Stahl, Abhishek Pratap, Gaurav Pandey, ..., Lara M. Mangravite (2016). Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. [link]
- Víctor Bellón, Véronique Stoven, and Chloé-Agathe Azencott (2016). Multitask feature selection with task descriptors. [link]
- Federica Eduati, Lara M Mangravite, Tao Wang, Hao Tang, J. Christopher Bare, Ruili Huang, ..., Julio Saez-Rodriguez (2015). Prediction of human population responses to toxic compounds by a collaborative competition. [link]
- Dominik G. Grimm, Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G. MacArhur, Kaitlin E. Samocha, David N. Cooper, Peter D. Stenson, Mark J. Daly, Jordan W. Smoller, Laramie E. Duncan, Karsten M. Borgwardt (2015). The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. [link]
- Mahito Sugiyama, Chloé-Agathe Azencott, Dominik Grimm, Yoshinobu Kawahara, and Karsten M. Borgwardt (2014). Multi-task feature selection on multiple networks via maximum flows. [link] [pdf] [supplementary pdf]
- Chloé-Agathe Azencott, Dominik Grimm, Mahito Sugiyama, Yoshinobu Kawahara, and Karsten M. Borgwardt (2013). Efficient network-guided multi-locus association mapping with graph cuts. [link] [pdf]
- Tony Kam-Thong, Chloé-Agathe Azencott, Lawrence Cayton, Benno Pütz, André Altmann, Nazanin Karbalai, Philipp G. Sämann, Bernhard Schölkopf, Betram Müller-Myhsok, and Karsten M. Borgwardt (2012). GLIDE: GPU-based linear regression for the detection of epistasis. [link] [pdf]
- Matthew A. Kayala, Chloé-Agathe Azencott, Jonathan H. Chen, and Pierre Baldi (2011). Learning to predict chemical reactions. [link] [pdf]
- Pierre Baldi, Chloé-Agathe Azencott, and S. Joshua Swamidass (2011). Bridging the gap between neural network and kernel methods: applications to drug discovery. [pdf]
- Chloé-Agathe Azencott and Pierre Baldi (2011). Virtual high-throughput screening with two-dimensional kernels, in Hands-On Pattern Recognition: Challenges in Machine Learning, 1 pp 131—146, I. Guyon, G. Cawley, G. Dror, and A. Saffari Editors. [link] [pdf]
- S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily, and Pierre Baldi (2010). A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. [link] [pdf]
- S. Joshua Swamidass, Chloé-Agathe Azencott, Ting-Wan Lin, Hugo Gramajo, Sheryl Tsai, and Pierre Baldi (2009). The Influence Relevance Voter: an accurate and interpretable virtual High throughput screening method. [link] [pdf]
- Chloé-Agathe Azencott, Alexandre Ksikes, S. Joshua Swamidass, Jonathan H. Chen, Liva Ralaivola, and Pierre Baldi (2007). One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical and biological properties. [link] [pdf]
Monographs
- Chloé-Agathe Azencott (2019). Machine learning tools for biomarker discovery, Sorbonne Université, HDR dissertation. tel-02354924 (defended January 30, 2020).
- Chloé-Agathe Azencott (2010). Statistical machine learning and data mining for chemoinformatics and drug discovery, PhD dissertation, University of California, Irvine. ProQuest/UMI AAT 3422105. [pdf] (defended August 31, 2010).
Conference Abstracts
- Charlotte Proudhon, Marc Michel, Maryam Heidary, Anissa Mechri, Caroline Hego, Aurore Rampanou, Christophe Le Tourneau, Maud Kamal, Ivan Bieche, Marc-Henri Stern, Olivier Lantz, Luc Cabel, Jean-Yves Pierga, François-Clément Bidard and Chloé-Agathe Azencott. Hypomethylation of circulating retrotransposons: Towards a non-invasive pan-cancer diagnosis, MAP Onco, 2022 (mini oral)
- Elise Dumas, Anne-Sophie Hamy, Sophie Houzard, Eva Hernandez, Aullène Toussaint, Julien Guerin, Laetitia Chanas, Victoire de Castelbajac, Mathilde Saint-Ghislain, Beatriz Grandal, Eric Daoud, Fabien Reyal and Chloé-Agathe Azencott. EDEN : An Event DEtection Network for the annotation of Breast Cancer recurrences in administrative claims data, ML4Health, 2022
- Éric Daoud, Anne-Sophie Hamy, Elise Dumas, Lidia Delrieu, Beatriz Grandal, Christine Le Bihan-Benjamin, Sophie Houzard, Philippe-Jean Bousquet, Judicael Hotton, Aude-Marie Savoye, Christelle Jouannaud, Chloé-Agathe Azencott, Marc Lelarge, and Fabien Reyal. Disparities in accessibility to oncology care centers in France, AACR Annual Meeting, 2022
- Élise Dumas, Beatriz Grandal, Lucie Laot, Eric Daoud, Lidia Delrieu, Marc Espié, Sophie Houzard, Christine Le Bihan-Benjamin, Philippe-Jean Bousquet, Elodie Anthony, Aurélien Latouche, Nadir Sella, Thierry Dubois, Annabelle Ballesta, Amyn Kassara, Elaine Del Nery, Benjamin Marande, Samar Alsafadi, Paul Gougis, Chloé-Agathe Azencott, Fabien Reyal and Anne-Sophie Hamy. ADRENALINE, an atlas for drug and breast cancer survival interaction: Comedications at diagnosis and impact on breast cancer mortality of the French breast cancer cohort (n=235,375), AACR Annual Meeting, 2022
- Asma Nouira and Chloé-Agathe Azencott. Multitask group Lasso for Genome Wide Association Studies in diverse populations, MLCSB track at ISMB, 2021 (oral)
- Lotfi Slim, Clément Chatelain and Chloé-Agathe Azencott. Nonlinear post-selection inference for genome-wide association studies, MLCB, 2020 (spotlight)
- Asma Nouira and Chloé-Agathe Azencott. Multitask group lasso for genome-wide association studies, SMPGD, 2020 (poster)
- Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott and Jean-Philippe Vert. kernelPSI: a powerful post-selection inference framework for nonlinear association testing in genome-wide association studies ProbGen, 2019 (oral)
- Stefani Dritsa, Thibaud Martinez, Weiyi Zhang, Chloé-Agathe Azencott and Antonio Rausell. Prediction of candidate disease genes through deep learning on multiplex biological networks. JOBIM, 2019 (poster)
- Christophe Le Priol, Chloé-Agathe Azencott and Xavier Gidrol. Large-scale RNA-seq datasets enable the detection of genes with a differential expression dispersion in cancer. JOBIM, 2019 (poster)
- Diane Duroux*, Héctor Climente-González*, Aldo Camargo, Lars Wienbrandt, David Ellinghaus, Chloé-Agathe Azencott and Kristel Van Steen. Improving efficiency in epistasis detection with a gene-based analysis using functional filters, 28th International Genetic Epidemiology Society meeting, 2019.
- Héctor Climente-González, Christine Lonjou, Fabienne Lesueur, Dominique Stoppa-Lyonnet, Nadine Andrieu, Chloé-Agathe Azencott, GENESIS investigators. Judging genetic loci by the company they keep: Comparing network-based methods for biomarker discovery in familial breast cancer, 68th Annual Meeting of the American Society of Human Genetics, 2018 (poster)
- Héctor Climente-González and Chloé-Agathe Azencott, R package for network-guided Genome-Wide Association Studies, ISMB NetBio, 2017 (poster).
- Christophe Le Priol, Laurent Guyon, Chloé-Agathe Azencott, and Xavier Gidrol. Analysis of microRNA sequences identifies conserved families of microRNAs, JOBIM, 2016 (poster).
- Víctor Bellón, Véronique Stoven, and Chloé-Agathe Azencott. DREAM Rheumatoid Arthritis Responder Challenge: Team Lucia, ''RECOMB/ISCB Conference on Regulatory \& Systems Genomics; DREAM Challenges \& Cytoscape Workshops, 2014 (poster).
- Chloé-Agathe Azencott, Dominik Grimm, Jordan Smoller and Laramie Duncan and Karsten M. Borgwardt. Beware of circularity: A critical assessment of the state of the art in deleteriousness prediction of missense variants, 64th Annual Meeting of The American Society of Human Genetics, 2014 (oral).
- Chloé-Agathe Azencott, Dominik Grimm, Yoshinobu Kawahara and Karsten M. Borgwardt. Efficiently mapping phenotypes to networks of genetic loci, Machine Learning in Computational Biology Workshop, 2012 (poster).
- Chloé-Agathe Azencott, Matthew A. Kayala, and Pierre Baldi. PropOrb: a frontier molecular orbital interaction proposer, 239th American Chemical Society National Meeting, 2010 (oral).
- Matthew A. Kayala, Chloé-Agathe Azencott, Jonathan H. Chen, and Pierre Baldi. OrbDB: A database of molecular orbital interactions, 239th American Chemical Society National Meeting, 2010 (oral).
- Chloé-Agathe Azencott, S. Joshua Swamidass and Pierre Baldi. Virtual high-throughput screening and early recognition, Women in Machine Learning Workshop, 2009 (poster).
- Chloé-Agathe Azencott, S. Joshua Swamidass and Pierre Baldi. Virtual high-throughput screening and early recognition, The Learning Workshop, 2009 (poster).
- Chloé-Agathe Azencott, Matthew A. Kayala, and Pierre Baldi. Combining quantitative data and qualitative knowledge to score reaction energies, 237th American Chemical Society National Meeting, 2009 (oral).
- Chloé-Agathe Azencott and Pierre Baldi. Virtual high-throughput screening with two-dimensional kernels, Agnostic Learning vs. Prior Knowledge Workshop, 2007 (oral).
- Chloé-Agathe Azencott and Pierre Baldi. Kernels for predictive regression--physical, biological and chemical properties of small molecules, Workshop for Women in Machine Learning, 2006 (oral).