cv

General Information

Full Name Sanjarbek Hudaiberdiev, PhD
Email kyrgyzbala@gmail.com, hudaiber@nih.gov

Scientific research areas

  • Genomics, Gene Regulation, Disease Genetics (Type 2 Diabetes)
  • Evolution, Origin of Life
  • Statistical inference, Machine Learning, LLMs

Technical Skills & Domain Knowledges

Biology Gene regulation (enhancers, silencers), epigenetics, pancreatic islet biology, variant analysis, population genetics, GWAS, PRS.
Evolutionary Analysis non-coding regions, ultra-conserved regions, human-specific evolution (HARs, hCONDELs), protein(domain) evolution, phylogenetics/phylogenomics, genomic synteny, host-parasite arms race
ML/stats Statistical Inference, Biostatistics, Bayesian modeling, Probabilistic Reasoning, Causality, ConvNets, LSTMs, Transformers
ML tools Tensorflow, PyTorch, Keras, Scikit-learn, NumPy/SciPy, R, MATLAB/Octave
Bioinformatics GWAS, fine-mapping, e/s/raQTLs, MPRA, ChIP-seq, RNA-seq, ATAC-seq, Hi-C, ENCODE, Roadmap, HCA, 4D Nucleome, UK Biobank, Containers (Docker, Singularity), Workflows (Nextflow, Snakemake), massive-scale data processing, parallel computing, HPC (SLURM, SGE)
Software Developement OOP, functional programming, Unit tests, Git
Languages Python, R, Rust (learning), Bash, MATLAB, C#, C/C++, Perl
Databases MySQL, SQLite, Oracle 10g.
OS Linux, MacOS

Experience

  • 2023
    Staff Scientist
    NCBI, NIH, Bethesda, MD
    • Integrating multi-omics (TF/Histone ChIP-seq, DHS, ATAC-seq, HiC, RNA-binding proteins’ binding sites, methylome) data to model disease transcriptional codes using multi-modal fusion deep neural networks to detect disease-causal genetic variation
    • Using Deep Learning to assess the phenotypic impacts of human-specific (vs. Macaques and Chimpanzees) INDELs in enhancers.
  • 2020 - 2023
    Research Fellow
    NCBI, NIH, Bethesda, MD
    • Developed TREDNet - A novel Deep Learning method for enhancer detection.
    • Lead a large multidisciplinary collaboration project to apply TREDNet to detect GWAS candidate causal mutations linked to Type 2 Diabetes and experimentally verified.
    • Developed a model for explaining the high-occupancy target (HOT) regions in humans using transcriptional condensates.
  • 2018-2020
    Postdoctoral Fellow
    NCBI, NIH, Bethesda, MD
    Mentor - Ivan Ovcharenko, PhD
    • Spearheaded the transition to Deep Learning methods to study the enhancers.
    • Automated the distributed training of large models (Keras+Tensorflow) on Google Cloud and Biowulf (NIH’s HPC cluster).
    • Applied ConvNets and LSTMs to model gene regulation and decompose LD blocks.
    • Managed and implemented the deployment of decades-worth codebase developed in-house to containers (Docker, singularity) and workflows management systems (Nextflow, Snakemake).
  • 2015-2018
    Postdoctoral Fellow
    NCBI, NIH, Bethesda, MD
    Mentor - Eugene Koonin, PhD
    • Studied the general evolution of CRISPR/Cas systems, as well as deep focus on protein CRISPR-associated 4 (Cas4). Paper is cited >55 times, including by Prof. Jennifer Doudna, and has given me an Erdös number of 3.
    • Analyzed evolutionary links between microbial adaptive immune systems and other selfish elements.
    • Gained an expert understanding of microbial genetics, comparative genomics, molecular evolutionary analysis.
  • 2011 - 2014
    Graduate research program
    ICGEB, Trieste, Italy
    • Thesis project - Functional annotation of Quorum Sensing systems in bacteria using subsystem based approach.
    • Implemented a wide range of algorithms and tools of computational genomics and sequence analysis
  • 2010 - 2011
    Software developer
    Octopus, Bishkek, Kyrgyzstan
    • Developed a concept for data analysis module for an existing commercial software for micro-financing.
    • Implemented financial data analysis module in .NET environment using C# and MS SQL Server.
    • Developed an ETL module to consolidate the data from government and third-party vendors using Pentaho and Python.
  • 2009 - 2010
    Oracle Database Developer
    Demirbank, Bishkek, Kyrgyzstan
    • Designed and implemented backup strategies of Oracle database and decreased the backup time by a factor of 10.
    • Lead the project of transitioning the existing infrastructure to Oracle Blade Modules.
    • Conducted SQL tuning and optimization which lead to reduction in ATM transaction latencies during the peak times from 10 seconds to < 1 second.
  • 2008 - 2009
    Research assistant
    Istanbul Technical University, Turkey
    Mentor - Prof. Zehra CATALTEPE
    • Designed and implemented Machine Learning experiments on protein function prediction using profile-HMMs.
    • Obtained a formal graduate training on classical Machine Learning (PRML by Christopher Bishop)

Education

  • 2011 - 2014
    PhD in Bioinformatics
    ICGEB, Trieste, Italy
    Supervisor - Prof. Sándor PONGOR
    • Thesis - “Computational analysis of quorum sensing systems in bacterial genomes - Developing automated annotation tools.”
    • Supervisor - Prof. Sándor PONGOR
  • 2006 - 2008
    MSc in Machine Learning
    Istanbul Technical University, Istanbul, Turkey
  • 2002 - 2006
    BSc in Computer Engineering
    Erciyes University, Kayseri, Turkey

Oral presentations

  • 2023
    • Sequence characteristics and an accurate model of abundant hyperactive loci in the human genome NLM IRP Seminar
  • 2022
    • Transcriptional regulation and disease genetics using Deep Learning NLM IRP Seminar
  • 2021
    • Deep causality in non-coding GWAS SNPs. NLM Summer Lectures Series. NLM
  • 2020
    • Using Deep Learning to to infer causality of GWAS SNPs. METU, Ankara, Turkey
  • 2019
    • Recurrent Language model for enhancers. NLM Summer Lectures Series. NLM
    • Understanding the grammar of enhancers using Deep Learning. NHGRI
    • Causal mutations in ZRS enhancer region using Deep Learning. LBNL, Berkeley
  • 2017
    • Phylogenomics of Cas4 family nucleases. AiChE. Raleigh, NC
  • 2016
    • Current topics in biological sequence analysis. ICGEB. Trieste, Italy

Posters

  • 2021
    • Hyperactive loci within non-coding regulatory regions of human genome. Mechanisms of Eukaryotic Transcription, CSHL, NY.
  • 2019
    • Two-stage Deep Learning cross-tissue predictor of enhancers shows unprecedented accuracy in detection of causal regulatory variants. Mechanisms of Eukaryotic Transcription, CSHL, NY.
    • Using Deep Learning to understand the grammar of enhancers. RECOMB, Washington DC
  • 2017
    • Phylogenomics of Cas4 family nucleases. International conference on CRISPR. Technologies, Raleigh, NC
  • 2013
    • Computational approaches to microbial communication/quorum sensing signaling. BAGECO, Ljubljana, Slovenia

Awards

  • 2019
    • Fellows Award for Research Excellence NIH, Bethesda, MD
  • 2011
    • ICGEB PhD fellowship ICGEB, Trieste, Italy
  • 2008
    • Scholarship for research in Machine Learning/Bioinformatics. TUBITAK, Istanbul, Turkey
  • 2007
    • Scholarship for BS final year project. TUBITAK, Kayseri, Turkey
  • 2002
    • Scholarship of Turkish ministry of education for undergraduate studies. 2002/2007, Turkey
  • 2001
    • Bronze medals in national physics olympiads for highschool students. 2001/2002, Kyrgyzstan

Teaching & Mentorship

  • 2020
    • Co-mentor, Post-bac student, NIH
  • 2019
    • Co-mentor, Summer research student, NIH
  • 2018
    • Lead instructor, Summer intern journal club, NIH
    • Co-mentor, Summer research student, NIH

Peer-reviewing for Scientific Journals

  • BMC Bioinformatics
  • NAR Database
  • Frontiers in Microbiology

Organization & Leadership

  • Commitee
    • Fellow Award for Research Excellence. Committee member 2023
    • Fellow Award for Research Excellence. Committee member 2022
  • Organised
    • Bioinformatics - Computer Methods in Molecular Biology –ICGEB, Trieste, Italy 2023
    • Bioinformatics - Computer Methods in Molecular Biology –ICGEB, Trieste, Italy 2014

Languages

  • English
  • Turkish
  • Russian
  • Italian
  • Kyrgyz