cv
General Information
Full Name | Sanjarbek Hudaiberdiev, PhD |
kyrgyzbala@gmail.com, hudaiber@nih.gov |
Scientific research areas
- Genomics, Gene Regulation, Disease Genetics (Type 2 Diabetes)
- Evolution, Origin of Life
- Statistical inference, Machine Learning, LLMs
Technical Skills & Domain Knowledges
Biology | Gene regulation (enhancers, silencers), epigenetics, pancreatic islet biology, variant analysis, population genetics, GWAS, PRS. |
Evolutionary Analysis | non-coding regions, ultra-conserved regions, human-specific evolution (HARs, hCONDELs), protein(domain) evolution, phylogenetics/phylogenomics, genomic synteny, host-parasite arms race |
ML/stats | Statistical Inference, Biostatistics, Bayesian modeling, Probabilistic Reasoning, Causality, ConvNets, LSTMs, Transformers |
ML tools | Tensorflow, PyTorch, Keras, Scikit-learn, NumPy/SciPy, R, MATLAB/Octave |
Bioinformatics | GWAS, fine-mapping, e/s/raQTLs, MPRA, ChIP-seq, RNA-seq, ATAC-seq, Hi-C, ENCODE, Roadmap, HCA, 4D Nucleome, UK Biobank, Containers (Docker, Singularity), Workflows (Nextflow, Snakemake), massive-scale data processing, parallel computing, HPC (SLURM, SGE) |
Software Developement | OOP, functional programming, Unit tests, Git |
Languages | Python, R, Rust (learning), Bash, MATLAB, C#, C/C++, Perl |
Databases | MySQL, SQLite, Oracle 10g. |
OS | Linux, MacOS |
Experience
-
2023 Staff Scientist
NCBI, NIH, Bethesda, MD - Integrating multi-omics (TF/Histone ChIP-seq, DHS, ATAC-seq, HiC, RNA-binding proteins’ binding sites, methylome) data to model disease transcriptional codes using multi-modal fusion deep neural networks to detect disease-causal genetic variation
- Using Deep Learning to assess the phenotypic impacts of human-specific (vs. Macaques and Chimpanzees) INDELs in enhancers.
-
2020 - 2023 Research Fellow
NCBI, NIH, Bethesda, MD - Developed TREDNet - A novel Deep Learning method for enhancer detection.
- Lead a large multidisciplinary collaboration project to apply TREDNet to detect GWAS candidate causal mutations linked to Type 2 Diabetes and experimentally verified.
- Developed a model for explaining the high-occupancy target (HOT) regions in humans using transcriptional condensates.
-
2018-2020 Postdoctoral Fellow
NCBI, NIH, Bethesda, MD Mentor - Ivan Ovcharenko, PhD- Spearheaded the transition to Deep Learning methods to study the enhancers.
- Automated the distributed training of large models (Keras+Tensorflow) on Google Cloud and Biowulf (NIH’s HPC cluster).
- Applied ConvNets and LSTMs to model gene regulation and decompose LD blocks.
- Managed and implemented the deployment of decades-worth codebase developed in-house to containers (Docker, singularity) and workflows management systems (Nextflow, Snakemake).
-
2015-2018 Postdoctoral Fellow
NCBI, NIH, Bethesda, MD Mentor - Eugene Koonin, PhD- Studied the general evolution of CRISPR/Cas systems, as well as deep focus on protein CRISPR-associated 4 (Cas4). Paper is cited >55 times, including by Prof. Jennifer Doudna, and has given me an Erdös number of 3.
- Analyzed evolutionary links between microbial adaptive immune systems and other selfish elements.
- Gained an expert understanding of microbial genetics, comparative genomics, molecular evolutionary analysis.
-
2011 - 2014 Graduate research program
ICGEB, Trieste, Italy - Thesis project - Functional annotation of Quorum Sensing systems in bacteria using subsystem based approach.
- Implemented a wide range of algorithms and tools of computational genomics and sequence analysis
-
2010 - 2011 Software developer
Octopus, Bishkek, Kyrgyzstan - Developed a concept for data analysis module for an existing commercial software for micro-financing.
- Implemented financial data analysis module in .NET environment using C# and MS SQL Server.
- Developed an ETL module to consolidate the data from government and third-party vendors using Pentaho and Python.
-
2009 - 2010 Oracle Database Developer
Demirbank, Bishkek, Kyrgyzstan - Designed and implemented backup strategies of Oracle database and decreased the backup time by a factor of 10.
- Lead the project of transitioning the existing infrastructure to Oracle Blade Modules.
- Conducted SQL tuning and optimization which lead to reduction in ATM transaction latencies during the peak times from 10 seconds to < 1 second.
-
2008 - 2009 Research assistant
Istanbul Technical University, Turkey Mentor - Prof. Zehra CATALTEPE- Designed and implemented Machine Learning experiments on protein function prediction using profile-HMMs.
- Obtained a formal graduate training on classical Machine Learning (PRML by Christopher Bishop)
Education
-
2011 - 2014 PhD in Bioinformatics
ICGEB, Trieste, Italy Supervisor - Prof. Sándor PONGOR- Thesis - “Computational analysis of quorum sensing systems in bacterial genomes - Developing automated annotation tools.”
- Supervisor - Prof. Sándor PONGOR
-
2006 - 2008 MSc in Machine Learning
Istanbul Technical University, Istanbul, Turkey -
2002 - 2006 BSc in Computer Engineering
Erciyes University, Kayseri, Turkey
Oral presentations
-
2023 - Sequence characteristics and an accurate model of abundant hyperactive loci in the human genome NLM IRP Seminar
-
2022 - Transcriptional regulation and disease genetics using Deep Learning NLM IRP Seminar
-
2021 - Deep causality in non-coding GWAS SNPs. NLM Summer Lectures Series. NLM
-
2020 - Using Deep Learning to to infer causality of GWAS SNPs. METU, Ankara, Turkey
-
2019 - Recurrent Language model for enhancers. NLM Summer Lectures Series. NLM
- Understanding the grammar of enhancers using Deep Learning. NHGRI
- Causal mutations in ZRS enhancer region using Deep Learning. LBNL, Berkeley
-
2017 - Phylogenomics of Cas4 family nucleases. AiChE. Raleigh, NC
-
2016 - Current topics in biological sequence analysis. ICGEB. Trieste, Italy
Posters
-
2021 - Hyperactive loci within non-coding regulatory regions of human genome. Mechanisms of Eukaryotic Transcription, CSHL, NY.
-
2019 - Two-stage Deep Learning cross-tissue predictor of enhancers shows unprecedented accuracy in detection of causal regulatory variants. Mechanisms of Eukaryotic Transcription, CSHL, NY.
- Using Deep Learning to understand the grammar of enhancers. RECOMB, Washington DC
-
2017 - Phylogenomics of Cas4 family nucleases. International conference on CRISPR. Technologies, Raleigh, NC
-
2013 - Computational approaches to microbial communication/quorum sensing signaling. BAGECO, Ljubljana, Slovenia
Awards
-
2019 - Fellows Award for Research Excellence NIH, Bethesda, MD
-
2011 - ICGEB PhD fellowship ICGEB, Trieste, Italy
-
2008 - Scholarship for research in Machine Learning/Bioinformatics. TUBITAK, Istanbul, Turkey
-
2007 - Scholarship for BS final year project. TUBITAK, Kayseri, Turkey
-
2002 - Scholarship of Turkish ministry of education for undergraduate studies. 2002/2007, Turkey
-
2001 - Bronze medals in national physics olympiads for highschool students. 2001/2002, Kyrgyzstan
Teaching & Mentorship
-
2020 - Co-mentor, Post-bac student, NIH
-
2019 - Co-mentor, Summer research student, NIH
-
2018 - Lead instructor, Summer intern journal club, NIH
- Co-mentor, Summer research student, NIH
Peer-reviewing for Scientific Journals
- BMC Bioinformatics
- NAR Database
- Frontiers in Microbiology
Organization & Leadership
-
Commitee
- Fellow Award for Research Excellence. Committee member 2023
- Fellow Award for Research Excellence. Committee member 2022
-
Organised
- Bioinformatics - Computer Methods in Molecular Biology –ICGEB, Trieste, Italy 2023
- Bioinformatics - Computer Methods in Molecular Biology –ICGEB, Trieste, Italy 2014
Languages
- English
- Turkish
- Russian
- Italian
- Kyrgyz