jaechang-hits

jaechang-hits

@jaechang-hits

GitHub
47 Skills
8836 Total Stars
March 2026 Joined

Public Skills

gnomad-database

by jaechang-hits

"Query gnomAD v4 population variant frequencies via GraphQL API. Retrieve allele counts and frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint metrics (pLI, LOEUF, missense z-score), and read depth coverage. Identify variants with low population frequency or under evolutionary constraint. For clinical pathogenicity classifications use clinvar-database; for GWAS associations use gwas-database."

API Dev 188 3mo ago

dbsnp-database

by jaechang-hits

"Query NCBI dbSNP for SNP records by rsID, gene, or genomic region via E-utilities (esearch, efetch, epost) and NCBI Variation Services REST API. Retrieve allele data, minor allele frequency, variant class (SNV, indel, MNV), clinical significance links, and cross-database IDs (ClinVar, dbVar, 1000G). Free access; 3 req/sec without API key, 10 req/sec with key. For clinical pathogenicity classifications use clinvar-database; for population frequencies use gnomad-database."

API Dev 188 3mo ago

pyhealth

by jaechang-hits

"PyHealth is a Python library for healthcare machine learning. Build clinical prediction models from EHR (Electronic Health Record) data: process MIMIC-III/IV, eICU, and OMOP-CDM datasets, encode medical codes (ICD, ATC, NDC), construct patient-level datasets, and train models (Transformer, RETAIN, GRASP, MedBERT) for tasks including mortality prediction, drug recommendation, readmission, and diagnosis prediction. Alternatives: FIDDLE (EHR preprocessing only), clinical-longformer (NLP on clinical notes only), ehr-ml (EHR embedding only)."

Automation 188 3mo ago

cellpose-cell-segmentation

by jaechang-hits

"Deep learning cell and nucleus segmentation from fluorescence and brightfield microscopy images. Uses pre-trained models (cyto3, nuclei, tissuenet) and a generalist flow-based algorithm that segments cells without requiring retraining on new image types. Outputs label masks for downstream morphology measurement and tracking. Use scikit-image watershed for rule-based segmentation; use Cellpose when deep learning generalization across staining conditions is needed."

Automation 188 3mo ago

pymc-bayesian-modeling

by jaechang-hits

"Bayesian modeling with PyMC 5. 8-step workflow: define model, set priors, define likelihood, sample (NUTS/ADVI), diagnose (R-hat, ESS, divergences), interpret posteriors, compare models (LOO/WAIC), predict. Hierarchical, logistic, GP model variants. Prior/posterior predictive checks."

Agents 188 3mo ago

clinical-decision-support-documents

by jaechang-hits

"Guidelines for generating clinical decision support (CDS) documents: patient cohort analyses (biomarker-stratified outcomes) and treatment recommendation reports (GRADE-graded evidence). Covers document structure, executive summary design, evidence grading (GRADE 1A–2C), statistical reporting (HR, CI, survival), and biomarker integration. Use when creating pharmaceutical research documents, clinical guidelines, or regulatory submissions."

Code Review 188 3mo ago

histolab-wsi-processing

by jaechang-hits

"Whole slide image processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC preprocessing. Use for dataset preparation, tile-based deep learning, and slide quality assessment. For advanced spatial proteomics or multiplexed imaging use pathml."

Analytics 188 3mo ago

scikit-survival-analysis

by jaechang-hits

"Survival analysis and time-to-event modeling with scikit-survival. Cox proportional hazards (standard/elastic net), Random Survival Forests, Gradient Boosting, SVMs for censored data. C-index (Harrell/Uno), Brier score, time-dependent AUC evaluation. Kaplan-Meier, Nelson-Aalen, competing risks. scikit-learn Pipeline/GridSearchCV compatible. For frequentist regression use statsmodels; for Bayesian survival use pymc; for simpler parametric models use lifelines."

CI/CD 188 3mo ago

flowio-flow-cytometry

by jaechang-hits

"Parse and create FCS (Flow Cytometry Standard) files v2.0-3.1. Read event data as NumPy arrays, extract channel metadata, handle multi-dataset files, export to CSV/FCS. For advanced gating and compensation use FlowKit."

Code Gen 188 3mo ago

bioservices-multi-database

by jaechang-hits

Unified Python interface to 40+ bioinformatics web services via bioservices library. Query UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem compounds, run BLAST searches, map identifiers across databases, retrieve GO annotations, and find protein-protein interactions. For single-database deep queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database integration workflows.

API Dev 188 3mo ago

pydicom-medical-imaging

by jaechang-hits

"Pure Python DICOM library for medical imaging (CT, MRI, X-ray, ultrasound). Read/write DICOM files, extract pixel data as NumPy arrays, access/modify metadata tags, apply windowing (VOI LUT), anonymize PHI, build DICOM from scratch, process series into 3D volumes. For whole-slide pathology images use histolab; for NIfTI neuroimaging use nibabel."

Processing 188 3mo ago

bedtools-genomic-intervals

by jaechang-hits

"Toolkit for genomic interval operations on BED, BAM, GFF, VCF files. Find overlapping regions, merge adjacent intervals, calculate coverage depth, extract FASTA sequences, find nearest features, and manipulate interval coordinates. Essential for ChIP-seq peak annotation, target region filtering, and genome arithmetic. Use tabix instead for indexed single-region queries; use deeptools for normalized bigWig coverage."

CLI Tools 188 3mo ago

celltypist-cell-annotation

by jaechang-hits

"Automated cell type annotation for scRNA-seq data using pre-trained logistic regression models. CellTypist ships 45+ models covering immune cells, gut, lung, brain, fetal tissues, and cancer microenvironments. Inputs a normalized AnnData; outputs per-cell predicted labels, majority-vote cluster labels, and confidence scores. Use when you want fast, reproducible, reference-model-backed annotation without manual marker inspection."

Comments 188 3mo ago

pyimagej-fiji-bridge

by jaechang-hits

"Python bridge to ImageJ2/Fiji enabling macro execution, plugin calls (Bio-Formats, TrackMate, Analyze Particles), bidirectional NumPy↔ImagePlus/ImgLib2 data exchange, and ImageJ Ops from Python. Use for automating Fiji-specific workflows headlessly from Python scripts. Use scikit-image instead for pure Python pipelines that do not require Fiji plugins; use napari for interactive visualization."

Code Gen 188 3mo ago

bwa-mem2-dna-aligner

by jaechang-hits

"Fast short-read DNA aligner for whole-genome, whole-exome, and ChIP-seq alignment to a reference genome. BWA-MEM2 is the 2× faster successor to BWA-MEM; outputs SAM/BAM with read group headers required by GATK. Produces primary alignments with supplementary records for chimeric reads. Use STAR instead for RNA-seq splice-aware alignment; use Bowtie2 as an alternative with comparable accuracy."

CLI Tools 188 3mo ago

archs4-database

by jaechang-hits

"Query uniformly processed RNA-seq gene expression profiles, tissue-specific expression patterns, and co-expression networks from the ARCHS4 database REST API. Retrieve z-score normalized expression across 1M+ human and mouse samples, find co-expressed genes, search samples by metadata, and download HDF5 expression matrices. For variant-level population genetics use gnomad-database; for pathway enrichment from gene lists use gget-genomic-databases (Enrichr)."

API Dev 188 3mo ago

trackpy-particle-tracking

by jaechang-hits

"Python library for tracking particles (fluorescent spots, colloids, vesicles, cells) in video microscopy using the Crocker-Grier algorithm. Core modules: locate particles in single frames, batch-process image sequences, link positions into trajectories, filter short-lived tracks, and compute mean squared displacement (MSD) for diffusion analysis. Supports 2D and 3D tracking with subpixel accuracy. Integrates with pims for reading TIF stacks, AVI, and image series. Use when you need quantitative single-particle tracking (SPT) from fluorescence or brightfield video and downstream diffusion coefficient extraction."

Automation 188 3mo ago

imaging-data-commons

by jaechang-hits

"Query NCI Imaging Data Commons (IDC) for cancer radiology and pathology imaging datasets hosted on Google Cloud. Search DICOM collections by modality, anatomical site, cancer type, or collection name. Download images via Google Cloud Storage or IDAT tool. 50TB+ of publicly accessible DICOM images. Requires Google Cloud account for large downloads; small queries work without billing. For local DICOM processing use pydicom-medical-imaging; for whole-slide pathology use histolab."

Processing 188 3mo ago

nnunet-segmentation

by jaechang-hits

"Train and deploy automated medical image segmentation models using nnU-Net's self-configuring framework that auto-selects optimal architecture, preprocessing, and training for any modality. Supports CT, MRI, microscopy, and ultrasound with 2D, 3D full-res, 3D low-res, and cascade configurations. Pipeline: convert dataset → plan and preprocess → train (5-fold cross-validation) → find best configuration → predict → ensemble. Use when classical segmentation fails and annotated training data is available."

Comments 188 3mo ago

clinvar-database

by jaechang-hits

"Query NCBI ClinVar via E-utilities REST API for clinical significance, pathogenicity classifications, and disease associations of genetic variants. Search by gene, rsID, condition, or review status. Returns structured variant records: ClinSig, submitter data, conditions, HGVS expressions. For GWAS associations use gwas-database; for variant consequence prediction use Ensembl VEP."

Processing 188 3mo ago

geniml

by jaechang-hits

"Geniml is a Python library for genomic interval machine learning. Train and apply region2vec embeddings to convert BED file regions into numeric vectors, load and index genomic interval datasets for ML pipelines, search embedding spaces with BEDSpace, and evaluate embedding quality. Use for chromatin accessibility clustering, regulatory element classification, cross-sample region comparison, and building ML models on genomic intervals."

Accessibility 188 3mo ago

deseq2-differential-expression

by jaechang-hits

"Differential expression analysis for bulk RNA-seq using R/Bioconductor DESeq2. Negative binomial GLM with empirical Bayes shrinkage, Wald and LRT tests, multi-factor designs, interaction terms, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap visualization. The R gold standard for DE analysis with native Bioconductor integration. Use pydeseq2-differential-expression for Python-based pipelines; use edgeR for TMM normalization."

Processing 188 3mo ago

simpleitk-image-registration

by jaechang-hits

"Register, segment, filter, and resample 3D medical images (MRI, CT, microscopy) using SimpleITK's Python API with support for DICOM, NIfTI, and multi-modal image analysis. Provides rigid/affine/deformable registration, threshold and region-growing segmentation, Gaussian and morphological filtering, label statistics, and format conversion. Use when aligning volumetric images across timepoints or modalities, automating segmentation of fluorescence microscopy, or converting DICOM series to NIfTI for analysis pipelines."

Processing 188 3mo ago

cellxgene-census

by jaechang-hits

"Query CELLxGENE Census (61M+ cells) programmatically. Search by cell type, tissue, disease, organism. Get expression matrices as AnnData, stream large queries out-of-core, train PyTorch models on single-cell data. For analyzing your own data use scanpy; for annotated data manipulation use anndata."

Automation 188 3mo ago

featurecounts-rna-counting

by jaechang-hits

"Counts aligned RNA-seq reads overlapping gene features in a GTF annotation. Takes sorted BAM files from STAR alignment and a GTF file; outputs a tab-delimited count matrix per gene across all samples. Handles strandedness (0=unstranded, 1=stranded, 2=reverse-stranded), paired-end, and multi-sample batch counting in a single command. Use Salmon instead for alignment-free quantification; use featureCounts when STAR BAMs already exist and a gene-level count matrix is needed."

CLI Tools 188 3mo ago

napari-image-viewer

by jaechang-hits

"Interactive multi-dimensional image viewer for scientific microscopy data. Napari displays 2D/3D/4D arrays as Image, Labels, Points, Shapes, and Tracks layers; supports real-time annotation, plugin-based analysis, and headless screenshot export. Core visualization tool for bioimage analysis workflows. Use ImageJ/FIJI for macro-based processing; use napari for Python-native interactive visualization and plugin-based deep learning segmentation review."

Comments 188 3mo ago

gwas-database

by jaechang-hits

"NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published genome-wide association studies. Query studies, associations, variants, traits, genes, and summary statistics. Build polygenic risk score candidates, analyze variant pleiotropy, download summary statistics for Manhattan plots. No authentication required."

API Dev 188 3mo ago

geo-database

by jaechang-hits

"Query NCBI Gene Expression Omnibus (GEO) for gene expression datasets and sample metadata via GEOparse Python library and E-utilities. Search datasets by keyword/organism/platform, download GSE series matrices, parse GPL platform annotations, extract GSM sample metadata, and load expression matrices into pandas. For single-cell data use cellxgene-census; for programmatic multi-DB access use gget-genomic-databases."

Comments 188 3mo ago

clinpgx-database

by jaechang-hits

"Query PharmGKB (Clinical Pharmacogenomics) database via REST API for drug-gene interactions, clinical annotations, dosing guidelines (CPIC, DPWG), variant-drug associations, and pharmacogenomic pathways. Search by gene, drug, rsID, or pathway. No authentication required. For somatic cancer pharmacogenomics use cosmic-database or opentargets-database; for drug structures use chembl-database-bioactivity."

API Dev 188 3mo ago

gatk-variant-calling

by jaechang-hits

"GATK Best Practices pipeline for germline SNP and indel variant calling from WGS/WES BAM files. Runs HaplotypeCaller in GVCF mode per sample, consolidates with GenomicsDBImport, joint-genotypes with GenotypeGVCFs, and applies VQSR or hard filters. Requires BWA-MEM2-aligned, markdup, and BQSR-processed BAMs. Use DeepVariant instead for a faster deep-learning alternative; GATK is the ENCODE/NIH standard for research and clinical genomics."

CLI Tools 188 3mo ago

gseapy-gene-enrichment

by jaechang-hits

"Gene set enrichment analysis (GSEA) and over-representation analysis (ORA) for RNA-seq and proteomics data. Wraps Enrichr API for ORA against MSigDB, KEGG, GO, and 200+ gene set databases; implements preranked GSEA for ranked gene lists from differential expression. Outputs enrichment tables and GSEA running-score plots. Use after DESeq2 or edgeR for pathway-level interpretation of differential expression results."

API Dev 188 3mo ago

ena-database

by jaechang-hits

"European Nucleotide Archive (ENA) REST API access for genomic sequences, raw reads, assemblies, and annotations. Portal API search with query syntax, Browser API retrieval (XML/FASTA/EMBL), file reports for FASTQ/BAM download URLs, taxonomy queries, cross-references. For multi-database Python queries prefer bioservices; for NCBI-specific queries use pubmed-database or Biopython Entrez."

API Dev 188 3mo ago

ensembl-database

by jaechang-hits

"Query Ensembl REST API for gene/transcript/variant annotations across 300+ species. Retrieve gene info by symbol/ID, sequence, cross-references (HGNC, RefSeq, UniProt), variants, regulatory features, comparative genomics. For bulk local access use pyensembl; for pathway lookups use kegg-database or reactome-database."

API Dev 188 3mo ago

cnvkit-copy-number

by jaechang-hits

"Detect somatic copy number variants (CNVs) from WES, WGS, or targeted sequencing BAM files with CNVkit v0.9.x. Pipeline: calculate bin-level coverage in target/antitarget regions, normalize against a reference, segment copy ratios with CBS or HMM, call amplifications and deletions, generate scatter/diagram plots, estimate tumor purity and ploidy, and export to VCF/SEG. Both CLI and Python API (cnvlib) shown. Use GATK CNV instead for deep WGS with population-scale controls; use CNVkit for targeted or exome sequencing where antitarget bins are critical."

Processing 188 3mo ago

gget-genomic-databases

by jaechang-hits

"Unified CLI/Python interface to 20+ genomic databases. Use for quick gene lookups (Ensembl search/info/seq), BLAST/BLAT sequence alignment, AlphaFold structure prediction, enrichment analysis (Enrichr), disease/drug associations (OpenTargets), single-cell data (CELLxGENE), cancer genomics (cBioPortal/COSMIC), and expression correlation (ARCHS4). Covers genomics, proteomics, and disease domains. For batch processing or advanced BLAST use biopython; for multi-database Python SDK workflows use bioservices."

Processing 188 3mo ago

harmony-batch-correction

by jaechang-hits

"Batch correction for single-cell RNA-seq (and other omics) with Harmony. Removes technical batch effects from PCA embeddings while preserving biological variation. Use after PCA, before UMAP/neighbors. Fast and scalable to millions of cells. Python (harmonypy, scanpy integration) and R (Seurat) APIs."

Analytics 188 3mo ago

gene-database

by jaechang-hits

"Query NCBI Gene via E-utilities for curated gene records across 1M+ taxa. Retrieve official gene symbols, aliases, RefSeq accessions, summary descriptions, genomic coordinates, GO annotations, and interaction data. Use for gene ID resolution, cross-species queries, and gene function summaries. For sequence retrieval use Ensembl; for expression data use geo-database."

Comments 188 3mo ago

deeptools-ngs-analysis

by jaechang-hits

"NGS analysis CLI toolkit for ChIP-seq, RNA-seq, ATAC-seq. BAM→bigWig conversion with normalization (RPGC, CPM, RPKM), sample correlation/PCA, heatmaps and profile plots around genomic features, enrichment fingerprints. For alignment use STAR/BWA; for peak calling use MACS2."

CLI Tools 188 3mo ago

gtars

by jaechang-hits

"GTARS is a Rust-backed Python library for fast genomic token arithmetic and BED file processing. Perform high-performance BED file I/O, genomic interval set operations (intersect, merge, complement, subtract), tokenization of genomic regions against a universe, and universe construction. Use for preprocessing large BED file collections, building token vocabularies for ML pipelines, and computing interval statistics at scale."

Embeddings 188 3mo ago

biopython-sequence-analysis

by jaechang-hits

"Biopython toolkit for sequence analysis workflows: parse FASTA/FASTQ/GenBank/GFF with SeqIO, query NCBI databases via Entrez (esearch/efetch/elink), run remote and local BLAST with result parsing, perform pairwise and multiple sequence alignment (PairwiseAligner, MUSCLE/ClustalW), and build/visualize phylogenetic trees (Phylo module). Use for gene family studies, phylogenomics, comparative genomics, and programmatic NCBI pipelines. For PCR design, restriction digestion, and cloning workflows use biopython-molecular-biology; for SAM/BAM alignments use pysam."

Processing 188 3mo ago

cbioportal-database

by jaechang-hits

"Access TCGA and other cancer genomics datasets via cBioPortal REST API. Retrieve somatic mutations, copy number alterations, gene expression profiles, and clinical data (survival, stage, treatment) for thousands of cancer studies. Use for tumor mutation burden analysis, oncoprint queries, and survival analysis. For population variant frequencies use gnomad-database; for drug-gene interactions use dgidb-database."

API Dev 188 3mo ago

cosmic-database

by jaechang-hits

"Query COSMIC (Catalogue Of Somatic Mutations In Cancer) for cancer somatic mutations, gene census data, mutational signatures, drug resistance variants, and cancer gene annotations. REST API v3.1 supports gene/sample/variant queries. Free registration required. For germline clinical variants use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity."

API Dev 188 3mo ago

anndata-data-structure

by jaechang-hits

"Annotated data matrices for single-cell genomics. AnnData stores expression data (X) with observation metadata (obs), variable metadata (var), layers, embeddings (obsm/varm), graphs (obsp/varp), and unstructured data (uns). Use for .h5ad/.zarr file handling, dataset concatenation, and scverse ecosystem integration. For analysis workflows use scanpy; for probabilistic models use scvi-tools."

Comments 188 3mo ago

omero-integration

by jaechang-hits

"OMERO is an open-source platform for biological image data management. Use the omero-py Python client to connect to an OMERO server, search and retrieve images as numpy arrays, annotate images with tags and key-value pairs, manage ROIs, and integrate OMERO image data into downstream analysis pipelines — all programmatically without the OMERO desktop GUI."

Comments 188 3mo ago

encode-database

by jaechang-hits

"Query the ENCODE Portal REST API for regulatory genomics data: TF ChIP-seq experiments, ATAC-seq/DNase-seq accessibility peaks, histone mark tracks, and RNA-seq datasets across 1000+ cell types and tissues. Search experiments by assay, biosample, or target protein; download BED/bigWig files; retrieve candidate cis-regulatory elements (cCREs) from ENCODE SCREEN by genomic region or gene. Use for finding regulatory tracks to annotate variants, identifying open chromatin in a cell type of interest, and downloading peak files for ChIP-seq or ATAC-seq analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database."

Accessibility 188 3mo ago

bcftools-variant-manipulation

by jaechang-hits

"Command-line toolkit for VCF/BCF variant file manipulation. Filter, merge, annotate, query, normalize, and compute statistics on variant call files. Essential for post-variant-calling pipelines: quality filtering, multi-sample merging, rsID annotation, and genotype extraction. Companion to samtools in the HTSlib ecosystem. Use GATK instead for complex indel realignment during variant calling; use VCFtools instead for population genetics statistics."

CLI Tools 188 3mo ago

etetoolkit

by jaechang-hits

"ETE Toolkit (ETE3) is a Python environment for phylogenetic tree analysis, manipulation, and visualization. Parse Newick/NHX/PhyloXML trees, traverse and annotate nodes, render publication-quality figures with TreeStyle/NodeStyle, integrate NCBI taxonomy for taxon-aware operations, and run PhyloTree workflows for comparative genomics. Use for building species trees, gene family evolution analysis, and annotated tree figures."

Comments 188 3mo ago