Research
1. Big data approaches to immunology
One of the most important branches of the Tao Wang Lab's research revolves around studying immunological problems in human cancers and immune-based therapies using modern statistical and machine learning approaches. Our lab has published important methodological works in this area, and our collaborators have published important studies in which they leveraged the models and methods developed from our research.
For example, one of our earlier works focused on leveraging tumorgrafts (PDX models) as a reference for tumor cells to accurately dissect the molecular and cellular patterns in the tumor stroma. We developed the DisHet algorithm, a Bayesian hierarchical model, to dissect bulk tumor RNA-seq data by leveraging patient-matched PDX and normal tissue RNA-Seq data. DisHet analyses uncovered 610 genes not previously linked to the kidney tumor microenvironment, and identified a new highly-inflamed kidney cancer subtype.
In another work, we developed a linear B cell epitope prediction model, BepiTBR, based on T-B cell reciprocity. We showed that explicitly including the enrichment of putative CD4+ T cell epitopes (predicted HLA class II epitopes) in the model leads to significant enhancement in the prediction of linear B cell epitopes. Curiously, the positive impact on B cell epitope generation is specific to the enrichment of DQ allele binders. Our work provides interesting mechanistic insights into the generation of B cell epitopes and points to a new avenue to improve B cell epitope prediction for the field.
2. Predictive modeling for highly impactful biomedical questions
Our lab has a strong background in machine learning and deep learning research for biomedical problems, especially in the prediction of prognosis, response and adverse effects to therapeutic treatments.
Members of our lab participated in several , which are a series of nationwide competitions that attract scientists of all disciplines to come up with the best prediction algorithms for important questions in biological and clinical research. These challenges include the following: the BROAD-DREAM Gene Essentiality Prediction Challenge, NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge, NCI-DREAM Drug Sensitivity Prediction Challenge and Prostate Cancer DREAM Challenge.
We developed the pMTnet model to predict TCR-binding specificities of neoantigens, and T cell antigens in general, presented by class I major histocompatibility complexes (pMHCs). pMTnet achieved an amazing AUROC of 0.833. We are now initiating a biotech startup company to commercialize this technology for designing TCRs for TCR-T therapies. We built a contrastive learning-based model, named Benisse, to embed B cell receptor sequences (BCRs) numerically and to integrate BCRs with single B cell phenotypes, which was . Most recently, we created the Cmai model for predicting the pairing between BCRs and antigens ().
3. Software and methodological development to analyze single cell profiling and spatial transcriptomics data
Our lab has rich experience with the analysis and methodological development of many different types of -omics data. More recently, our research interests have turned to single cell profiling and spatial transcriptomics data analyses, as nowadays such techniques are used frequently to study tumor cells and immune cells, as well as other cell types.
Our lab published SCINA, the first semi-supervised clustering and typing algorithm, for single cell RNA-Seq and CyTOF data (). In our , we developed tessa (TCR functional landscape estimation supervised with scRNA-Seq analysis) to integrate TCRs with the phenotypes of T cells, in order to estimate the effect that TCRs confer upon the phenotypes of T cells. By applying tessa on a series of single T cell sequencing datasets, we demonstrated that TCR similarity constrains the phenotypes of T cells to be similar and dictates a gradient in antigen targeting efficiency of T cell clonotypes with convergent TCRs.
For B cells, we developed a model called Benisse, which revealed a curious coupling effect between BCRs and single B cell phenotypes in various diseases (). Our analyses indicate that the BCR signalling pathway is most activated and induces the strongest BCR rearrangement events in earlier severe phases of COVID-19 and weakens when the patients are on the pathway to recovery.
Our newest publication is the Spacia software, , which leveraged spatial and gene expression information to infer cell-to-cell interactions from single cell resolution spatially resolved transcriptomics data. We showed the advances of this approach over prior methods.
4. Development of online webservers and portals to host statistical models and bioinformatics software for the research community
Our lab has rich experience in the development of webservers to host the models and methods that were developed from our research to facilitate open data sharing and to offer biologists and clinicians easy access to computational functionalities.
For example, we created the LinkageAnalyzer software for phenotype-genotype mapping of mouse forward genetic screening data (). We incorporated LinkageAnalyzer into the Mutagenetix database, led by Dr. Bruce Beutler, and it is now run on a daily basis for identifying immune-response genes from ENU-mutated mice.
Other examples of our work include:
- DisHet webserver for RNA-seq data deconvolution ()
- SCINA webserver for single cell sequencing data analyses ()
- BepiTBR webserver for B cell epitope prediction ()
We are starting to organize these separate entities that we created into a centralized webportal and data commons called the , which provides user-friendly, comprehensive and cloud-based data sharing and computational resources for computational immunology.