Code
My code is hosted on GitHub
Things you'll find there:
Here is a list of projects I have contributed to in the context of my research:
Multitask lassos for GWAS
— Multitask Lasso with task descriptors: a multitask Lasso approach that makes use of task descriptors. The code, developed by Víctor Bellón, is available on GitHub.
Reference:
Víctor Bellón, Véronique Stoven, and Chloé-Agathe Azencott (2016). Multitask feature selection with task descriptors. [link]
— MuGLasso: uses multitask group lasso to analyze GWAS data across diverse populations. The code, developed by Asma Nouira, is available on GitHub.
Reference:
Asma Nouira and Chloé-Agathe Azencott (2022). Multitask group Lasso for Genome Wide Association Studies in diverse populations. [link] [biorxiv]
Kernel post-selection inference
— kernelPSI: Performing post-selection inference on kernel selection. The code, developed mainly by Lotfi Slim, is available on GitHub and also contains a GPU implementation of HSIC.
References:
- Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, and Jean-Philippe Vert (2019). kernelPSI: a post-selection inference framework for nonlinear variable selection. [link]
- Lotfi Slim, Clément Chatelain, and Chloé-Agathe Azencott (2022). Nonlinear post-selection inference for genome-wide association studies. [link] [biorxiv]
Epistasis tools
— GLIDE: GPU-based Linear Detection of Epistasis: GLIDE is a GPU-based approach for the detection of epistasis in genome-wide data. It allows for the systematic computation of a linear regression between pairs of genetic loci and a phenotype. (It has mostly been made obsolete by more recent NVIDIA libraries.) CUDA code (for execution on NVIDIA GPUs) developed by Tony Kam-Thong and myself is available on GitHub. An additional How To as well as (Python and bash) scripts for working with GLIDE, that I developed in the context of a case study, are available on GitHub.
Reference:
Tony Kam-Thong, Chloé-Agathe Azencott, Lawrence Cayton, Benno Pütz, André Altmann, Nazanin Karbalai, Philipp G. Sämann, Bernhard Schölkopf, Betram Müller-Myhsok, and Karsten M. Borgwardt (2012). GLIDE: GPU-based linear regression for the detection of epistasis. [link] [pdf]
— epiGWAS: Functions to perform robust epistasis detection in genome-wide association studies based on ideas from causal inference. The code, developed mainly by Lotfi Slim, is available on GitHub.
References:
- Lotfi Slim, Clément Chatelain, Chloé-Agathe Azencott, Jean-Philippe Vert (2020). Novel methods for epistasis detection in genome-wide association studies. [link]
- Lotfi Slim, Hélène de Foucauld, Clément Chatelain and Chloé-Agathe Azencott (2022). A systematic analysis of gene-gene interaction in multiple sclerosis. [link]
— network-guided epistasis detection: See gwas-tools below.
GWAS incorporating networks
— SConES: Selecting Connected Explanatory SNPs is a network-guided approach for analyzing genome-wide data. It allows for the discovery of multiple genetic loci that are maximally associated with a phenotype, while tending to be connected on a given biological network. This network can be constructed for example from a gene-gene interaction network (based on proximity), or in any way such that you believe that neighboring SNPs should tend to be selected together.
Matlab code developed by Dominik Grimm, Yoshinobu Kawahara and myself is available on GitHub.
SConES is also available as part of EasyGWAS, a framework for the analysis and meta-analysis of GWAS data (with Python interfaces).
Reference:
Chloé-Agathe Azencott, Dominik Grimm, Mahito Sugiyama, Yoshinobu Kawahara, and Karsten M. Borgwardt (2013). Efficient network-guided multi-locus association mapping with graph cuts. [link] [pdf]
— Multi-SConES is a multi-phenotype version of SConES. R code, developed by Mahito Sugiyama, is available on GitHub.
Reference:
Mahito Sugiyama, Chloé-Agathe Azencott, Dominik Grimm, Yoshinobu Kawahara, and Karsten M. Borgwardt (2014). Multi-task feature selection on multiple networks via maximum flows. [link] [pdf] [supplementary pdf]
— martini: A BioConductor package for using biological networks to guide GWAS, based on SConES (see below). This package was developed mainly by Héctor Climente.
Reference:
Héctor Climente-González and Chloé-Agathe Azencott (2021). martini: an R package for genome-wide association studies using SNP networks. [biorxiv]
— gwas-tools: An R package to discover susceptibility genes in GWAS using network-guided approaches. The code, developed mainly by Héctor Climente, as well as Diane Duroux for the epistasis detection, is available on GitHub
References:
- Héctor Climente-González, Chloé-Agathe Azencott, Makoto Yamada (2023). A network-guided protocol to discover susceptibility genes in genome-wide association studies using stability selection. [link]
- Héctor Climente-González, Christine Lonjou, Fabienne Lesueur, GENESIS Study collaborators, Dominique Stoppa-Lyonnet, Nadine Andrieu, Chloé-Agathe Azencott (2021). Boosting GWAS using biological networks: A study on susceptibility to familial breast cancer [link]
- Diane Duroux, Héctor Climente-González, Chloé-Agathe Azencott and Kristel Van Steen (2022). Interpretable network-guided epistasis detection. [link]
— dmGWAS_2.3.1 a version of dmGWAS_2.3 that is compatible with igraph0.7 available here
RNAseq differential dispersion
Code in R to simulate and measure differential dispersion in RNAseq data, written by Christophe Le Priol and available on GitHub.
Reference:
Christophe Le Priol, Chloé-Agathe Azencott, and Xavier Gidrol (2023). Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression. [link]
PyComBat
A Python implementation of ComBat for batch effects correction in high-throughput molecular data using empirical Bayes methods, written mostly by Abdelkader Behdenna and Maximilien Colange, available as part of the inmoose package on PyPI.
Reference:
Abdelkader Behdenna, Maximilien Colange, Julien Haziza, Aryo Gema, Guillaume Appé, Chloé-Agathe Azencott and Akpéli Nordor (2023). pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. [link]
Drug-target interaction prediction
— Komet: a Kronecker-optimized method for drug-target interaction prediction. The code, developed by Gwenn Guichaoua, is available on github.
Reference:
Gwenn Guichaoua, Philippe Pinel, Brice Hoffmann, Chloé-Agathe Azencott and Véronique Stoven (2024). Advancing drug-target interactions prediction: leveraging a large-scale dataset with a rapid and robust chemogenomic algorithm. [biorxiv]
CROC
A Python package for calculating ROC curves and Concentrated ROC (CROC) curves, written by S. Josh Swamidass and available on PyPI.
Reference:
S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily, and Pierre Baldi (2010). A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. [link] [pdf]