Research interests

I focus on the development of methods for feature selection in high-dimensional data. Essentially, my goal is to make sense of data with a small number of samples and a large number of variables. These variables can be clinical variables (such as age, cholesterol levels or smoking history), genetic variables (such as gene expression, mutations, or epigenetic markers), or describe electronic health records. How can we find out which of them play a role in a particular biological process or pathology? My work has numerous applications, in particular in precision medicine, where we try to develop treatments that are adapted to the (genetic) characteristics of patients, by contrast with a classical one-size-fits-all approach.

I am interested in the incorporation of additional (structured) information, for example as biological networks; in multi-task approaches, where one addresses multiple related problems simultaneously; and in the development of non-linear approaches to model interactions between variables. In terms of machine learning, a lot of my work is linked to structured sparsity. This has led for example to the development of SConES (Selecting CONected Explanatory SNPs), a method for network-guided multi-locus association mapping based on graph cuts. I am also very much interested in post-selection inference, which aims at obtaining valid p-values after a feature selection procedure.

I am also currently working on projects involving the analysis of various types of biological networks, the integration of different data types (multiview/multimodal learning), and the prediction of molecule-protein interactions.

Ongoing funded projects

  • STEVE: Advancing genotype to phenotype Studies by considering Transposable Elements Variability and Epivariability. ANR PRC, 2021 ­­– 2025.
  • PRAIRIE Chair
  • MLFPM: Machine Learning Frontiers in Precision Medicine. H2020 Innovative Training Network, 2019 – 2023.
  • SCAPHE: Methods for discovering SNP Combinations Associated with a PHEnotype from genome­wide data. ANR JCJC, 2019 – 2022.

Publications

Textbook

  • Chloé-Agathe Azencott (first edition 2018, second edition 2022). Introduction au Machine Learning. Collection InfoSup, Dunod [book's webpage].

Preprints

  • Élise Dumas, Lucie Laot, Florence Coussy, Beatriz Grandal Rejo, Éric Daoud, Enora Laas, Amyn Kassara, Alena Majdling, Rayan Kabirian, Floriane Jochum, Paul Gougis, Sophie Michel, Sophie Houzard, Christine Le Bihan-Benjamin, Philippe-Jean Bousquet, Judicaël Hotton, Chloé-Agathe Azencott, Fabien Reyal, and Anne-Sophy Hamy (2022). The French Early Breast Cancer Cohort (FRESH): a resource for breast cancer research and evaluations of oncology practices based on the French National Healthcare System Database (SNDS) [medrxiv]
  • Eric Daoud, Anne-Sophie Hamy-Petit, Elise Dumas, Lidia Delrieu, Beatriz Grandal Rejo, Christine Le Bihan-Benjamin, Sophie Houzard, Philippe-Jean Bousquet, Judicaël Hotton, Aude-Marie Savoye, Christelle Jouannaud, Chloé-Agathe Azencott, Marc Lelarge, and Fabien Reyal (2021). Disparities in accessibility to oncology care centers in France. [medrxiv]
  • Abdelkader Behdenna, Julien Haziza, Chloé-Agathe Azencott and Akpéli Nordor (2021). pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. [biorxiv]
  • Héctor Climente-González and Chloé-Agathe Azencott (2021). martini: an R package for genome-wide association studies using SNP networks. [biorxiv]

Peer-reviewed Publications

  • Lotfi Slim, Hélène de Foucauld, Clément Chatelain and Chloé-Agathe Azencott (2022). A systematic analysis of gene-gene interaction in multiple sclerosis. [biorxiv]
  • Diane Duroux, Héctor Climente-González, Chloé-Agathe Azencott and Kristel Van Steen (2022). Interpretable network-guided epistasis detection. [link]
  • Asma Nouira and Chloé-Agathe Azencott (2022). Multitask group Lasso for Genome Wide Association Studies in diverse populations. [link] [biorxiv]
  • Lotfi Slim, Clément Chatelain, and Chloé-Agathe Azencott (2022). Nonlinear post-selection inference for genome-wide association studies. [link] [biorxiv]
  • Matthieu Najm, Chloé-Agathe Azencott, Benoît Playe, and Véronique Stoven (2021). Drug Target Identification with Machine Learning: How to Choose Negative Examples. [link] [biorxiv]
  • Yue Jiao, Fabienne Lesueur, Chloé-Agathe Azencott, Maïté Laurent, Noura Mebirouk, Lilian Laborde, Juana Beauvallet, Marie-Gabrielle Dondon, Séverine Eon-Marchais, Anthony Laugé, GEMO Study Collaborators, GENEPSO Study Collaborators, Catherine Noguès, Nadine Andrieu, Dominique Stoppa-Lyonnet and Sandrine M. Caputo (2021). A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. [link]
  • Veronica Tozzo, Chloé-Agathe Azencott, Samuele Fiorini, Emanuele Fava, Andrea Trucco, Annalisa Barla (2021). Where do we stand in regularization for life science studies? [link]
  • Héctor Climente-González, Christine Lonjou, Fabienne Lesueur, GENESIS Study collaborators, Dominique Stoppa-Lyonnet, Nadine Andrieu, Chloé-Agathe Azencott (2021). Boosting GWAS using biological networks: A study on susceptibility to familial breast cancer. [link]
  • Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, Jean-Philippe Vert (2020). Novel methods for epistasis detection in genome-wide association studies. [link]
  • Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, and Jean-Philippe Vert (2019). kernelPSI: a post-selection inference framework for nonlinear variable selection. [link]
  • Héctor Climente-González, Chloé-Agathe Azencott, Samuel Kaski and Makoto Yamada (2019). Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. [link].
  • Benoît Playe, Chloé-Agathe Azencott, and Véronique Stoven (2018). Efficient multi-task chemogenomics for drug specificity prediction. [link]
  • Chloé-Agathe Azencott (2018). Machine learning and genomics: precision medicine versus patient privacy. [link] [arXiv]
  • Chloé-Agathe Azencott, Tero Aittokallio, Sushmita Roy, Thea Norman, Stephen Friend, Gustavo Stolovitzky, Anna Goldenberg, and DREAM Idea Challenge Consortium (2017). The inconvenience of data of convenience: computational research beyond post-mortem analyses. [link] [SharedIt] [pdf]
  • Solveig K. Sieberts, Fan Zhu, Javier García-García, Eli Stahl, Abhishek Pratap, Gaurav Pandey, ..., Lara M. Mangravite (2016). Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. [link]
  • Chloé-Agathe Azencott (2016). Network-guided biomarker discovery, in Lecture Notes in Computer Science 9605 State-of-the-Art Volume on Machine Learning for Health Informatics, A. Holzinger Editor, Springer. [link] [arxiv]
  • Víctor Bellón, Véronique Stoven, and Chloé-Agathe Azencott (2016). Multitask feature selection with task descriptors. [link]
  • Federica Eduati, Lara M Mangravite, Tao Wang, Hao Tang, J. Christopher Bare, Ruili Huang, ..., Julio Saez-Rodriguez (2015). Prediction of human population responses to toxic compounds by a collaborative competition. [link]
  • Dominik G. Grimm, Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G. MacArhur, Kaitlin E. Samocha, David N. Cooper, Peter D. Stenson, Mark J. Daly, Jordan W. Smoller, Laramie E. Duncan, Karsten M. Borgwardt (2015). The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. [link]
  • Mahito Sugiyama, Chloé-Agathe Azencott, Dominik Grimm, Yoshinobu Kawahara, and Karsten M. Borgwardt (2014). Multi-task feature selection on multiple networks via maximum flows. [link] [pdf] [supplementary pdf]
  • Chloé-Agathe Azencott, Dominik Grimm, Mahito Sugiyama, Yoshinobu Kawahara, and Karsten M. Borgwardt (2013). Efficient network-guided multi-locus association mapping with graph cuts. [link] [pdf]
  • Tony Kam-Thong, Chloé-Agathe Azencott, Lawrence Cayton, Benno Pütz, André Altmann, Nazanin Karbalai, Philipp G. Sämann, Bernhard Schölkopf, Betram Müller-Myhsok, and Karsten M. Borgwardt (2012). GLIDE: GPU-based linear regression for the detection of epistasis. [link] [pdf]
  • Matthew A. Kayala, Chloé-Agathe Azencott, Jonathan H. Chen, and Pierre Baldi (2011). Learning to predict chemical reactions. [link] [pdf]
  • Pierre Baldi, Chloé-Agathe Azencott, and S. Joshua Swamidass (2011). Bridging the gap between neural network and kernel methods: applications to drug discovery. [pdf]
  • Chloé-Agathe Azencott and Pierre Baldi (2011). Virtual high-throughput screening with two-dimensional kernels, in Hands-On Pattern Recognition: Challenges in Machine Learning, 1 pp 131—146, I. Guyon, G. Cawley, G. Dror, and A. Saffari Editors. [link] [pdf]
  • S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily, and Pierre Baldi (2010). A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. [link] [pdf]
  • S. Joshua Swamidass, Chloé-Agathe Azencott, Ting-Wan Lin, Hugo Gramajo, Sheryl Tsai, and Pierre Baldi (2009). The Influence Relevance Voter: an accurate and interpretable virtual High throughput screening method. [link] [pdf]
  • Chloé-Agathe Azencott, Alexandre Ksikes, S. Joshua Swamidass, Jonathan H. Chen, Liva Ralaivola, and Pierre Baldi (2007). One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical and biological properties. [link] [pdf]

Monographs

  • Chloé-Agathe Azencott (2019). Machine learning tools for biomarker discovery, Sorbonne Université, HDR dissertation. tel-02354924 (defended January 30, 2020).
  • Chloé-Agathe Azencott (2010). Statistical machine learning and data mining for chemoinformatics and drug discovery, PhD dissertation, University of California, Irvine. ProQuest/UMI AAT 3422105. [pdf] (defended August 31, 2010).

Conference Abstracts

  • Asma Nouira and Chloé-Agathe Azencott. Multitask group Lasso for Genome Wide Association Studies in diverse populations. MLCSB track at ISMB, 2021 (oral)
  • Lotfi Slim, Clément Chatelain and Chloé-Agathe Azencott. Nonlinear post-selection inference for genome-wide association studies. MLCB, 2020 (spotlight)
  • Asma Nouira and Chloé-Agathe Azencott. Multitask group lasso for genome-wide association studies. SMPGD, 2020 (poster)
  • Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott and Jean-Philippe Vert. kernelPSI: a powerful post-selection inference framework for nonlinear association testing in genome-wide association studies ProbGen, 2019 (oral)
  • Stefani Dritsa, Thibaud Martinez, Weiyi Zhang, Chloé-Agathe Azencott and Antonio Rausell. Prediction of candidate disease genes through deep learning on multiplex biological networks. JOBIM, 2019 (poster)
  • Christophe Le Priol, Chloé-Agathe Azencott and Xavier Gidrol. Large-scale RNA-seq datasets enable the detection of genes with a differential expression dispersion in cancer. JOBIM, 2019 (poster)
  • Diane Duroux*, Héctor Climente-González*, Aldo Camargo, Lars Wienbrandt, David Ellinghaus, Chloé-Agathe Azencott and Kristel Van Steen. Improving efficiency in epistasis detection with a gene-based analysis using functional filters., 28th International Genetic Epidemiology Society meeting, 2019.
  • Héctor Climente-González, Christine Lonjou, Fabienne Lesueur, Dominique Stoppa-Lyonnet, Nadine Andrieu, Chloé-Agathe Azencott, GENESIS investigators. Judging genetic loci by the company they keep: Comparing network-based methods for biomarker discovery in familial breast cancer., 68th Annual Meeting of the American Society of Human Genetics, 2018 (poster)
  • Héctor Climente-González and Chloé-Agathe Azencott, R package for network-guided Genome-Wide Association Studies, ISMB NetBio, 2017 (poster).
  • Christophe Le Priol, Laurent Guyon, Chloé-Agathe Azencott, and Xavier Gidrol. Analysis of microRNA sequences identifies conserved families of microRNAs, JOBIM, 2016 (poster).
  • Chloé-Agathe Azencott, S. Joshua Swamidass and Pierre Baldi. Virtual high-throughput screening and early recognition, Women in Machine Learning Workshop, 2009 (poster).
  • Chloé-Agathe Azencott, S. Joshua Swamidass and Pierre Baldi. Virtual high-throughput screening and early recognition, The Learning Workshop, 2009 (poster).
  • Chloé-Agathe Azencott and Pierre Baldi. Kernels for predictive regression--physical, biological and chemical properties of small molecules, Workshop for Women in Machine Learning, 2006 (oral).