Customer Name | DEMO | Customer Institutaion | MIT | Customer Email | xxx@mit.edu | Project ID | 2019A15xx |
---|---|
Single-cell genomic technologies have revolutionized the way scientists can interrogate heterogeneous tissues or rare subpopulations of cells. Single-cell RNA sequencing (scRNA-seq) has been at the forefront of method development both in the laboratory and computationally to provide robust methods for downstream data analysis.
10x Genomics’ single-cell RNA-seq (scRNA-seq) technology, the Chromium™ Single Cell 3’ Solution, allows you to analyze transcriptomes on a cell-by-cell basis through the use of microfluidic partitioning to capture single cells and prepare barcoded, next-generation sequencing (NGS) cDNA libraries. Specifically, single cells, reverse transcription (RT) reagents, Gel Beads containing barcoded oligonucleotides, and oil are combined on a microfluidic chip to form reaction vesicles called Gel Beads in Emulsion, or GEMs. GEMs are formed in parallel within the microfluidic channels of the chip, allowing the user to process 100’s to 10,000’s of single cells in a single 7-minute Chromium™ Instrument run. It’s important to note that cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell to ensure a low doublet rate, while maintaining a high cell recovery rate of up to ~65%.
Each functional GEM contains a single cell, a single Gel Bead, and RT reagents. Within each GEM reaction vesicle, a single cell is lysed, the Gel Bead is dissolved to free the identically barcoded RT oligonucleotides into solution, and reverse transcription of polyadenylated mRNA occurs. As a result, all cDNAs from a single cell will have the same barcode, allowing the sequencing reads to be mapped back to their original single cells of origin. The preparation of NGS libraries from these barcoded cDNAs is then carried out in a highly efficient bulk reaction.
See Appendix for more details.
Species name:human
Latin name:Homo sapiens
Specimens:liver tissue
Dababase | Web links | Version/date |
---|---|---|
Genome | ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz | v96 | Gene Orthology (GO) | http://geneontology.org | 2016.04 | KEGG | http://www.kegg.jp/kegg | 2016.05 |
Classes | Softwares | Version | Fundamental analysis | CellRanger | 5.0.1 | Advanced analysis | Seurat | 3.1.1 | Plots | R | 3.5.2 |
---|
Over review of single cell RNA sequencing QC results by CellRanger (web-summary.html).
Document location:
summary/1_Cellranger_result/gex/web_summary.html
Sample | Number of Reads | Valid Barcodes | Sequencing Saturation | Q30 Bases in Barcode | Q30 Bases in RNA Read | Q30 Bases in UMI |
con | 638,901,019 | 97.4% | 67.3% | 93.7% | 90.1% | 92.4% |
gex | 278,686,771 | 98.1% | 74.6% | 96.9% | 94.3% | 96.4% |
Document location:
summary/1_Cellranger_result/sample_sequence_stat.xlsx
Sample | Estimated Number of Cells | Mean Reads per Cell | Median Genes per Cell | Reads Mapped to Genome | Reads Mapped Confidently to Genome | Reads Mapped Confidently to Intergenic Regions | Reads Mapped Confidently to Intronic Regions | Reads Mapped Confidently to Exonic Regions | Reads Mapped Confidently to Transcriptome | Reads Mapped Antisense to Gene | Fraction Reads in Cells | Total Genes Detected |
con | 11,670 | 54,747 | 1,985 | 96.1% | 91.1% | 3.3% | 35.1% | 52.8% | 49.3% | 1.3% | 94.7% | 31,787 |
gex | 6,296 | 44,264 | 1,529 | 97.1% | 92.4% | 2.2% | 28.2% | 62.0% | 58.3% | 1.0% | 84.0% | 26,403 |
Document location:
summary/1_Cellranger_result/sample_align_stat.xlsx
Seurat allows to easily explore QC metrics and filter cells, we can visualize gene and molecule counts, plot their relationship, and exclude cells with a clear outlier number of genes detected as potential multiplets. Followed criteria were used to filter cells:
1.gene counts 500-Inf per cell.
2.UMI counts<-Inf.
3. the percentage of mitochondrial genes < 25% 25 (customized).
4.DoubletFinder for detecting doublets.
Sample | before_filter_cell_num | after_filter_cell_num | precent |
gex | 6296 | 5911 | 93.89% |
con | 11670 | 9923 | 85.03% |
Document location:
src/summary_part/3_Cluster_result/sample_cells_filt_stat.xlsx
Document location:
summary/3_Cluster_result/Basicinfo_nGene_nUMI_pMito.png
summary/3_Cluster_result/Filter_Basicinfo_nGene_nUMI_pMito.png
Cell clustering contains the following steps:
1)Normalizing the data
After removing unwanted cells from the dataset, the next step is to normalize the data. By default, we employ a global-scaling normalization method “LogNormalize” that normalizes the gene expression measurements for each cell by the total expression.
2)PCA(Principal component analysis)
To overcome the extensive technical noise in any single gene for scRNA-seq data, Seurat clusters cells based on their PCA scores, with each PC essentially representing a ‘metagene’ that combines information across a correlated gene set.
3)Clustering cells
Seurat implements an graph-based clustering approach. Distances between the cells are calculated based on previously identified PCs. Seurat approach was heavily inspired by recent manuscripts which applied graph-based clustering approaches to scRNA-seq data - SNN-Cliq and CyTOF data – PhenoGraph. Briefly, these methods embed cells in a graph structure - for example a K-nearest neighbor (KNN) graph, with edges drawn between cells with similar gene expression patterns, and then attempt to partition this graph into highly interconnected quasi-cliques or communities. As in PhenoGraph, we first construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard distance). To cluster the cells, we apply modularity optimization techniques – SLM, to iteratively group cells together, with the goal of optimizing the standard modularity function.
4)t-SNE (t-distributed Stochastic Neighbor Embedding) visualization
Seurat continues to use t-SNE as a powerful tool to visualize and explore these datasets. The tSNE aims to place cells with similar local neighborhoods in high-dimensional space together in low-dimensional space.
Clusters | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | Total |
gex | 22(0.37%) | 2(0.03%) | 6(0.10%) | 5(0.08%) | 8(0.14%) | 1033(17.48%) | 1028(17.39%) | 13(0.22%) | 883(14.94%) | 844(14.28%) | 675(11.42%) | 174(2.94%) | 5(0.08%) | 0(0%) | 408(6.90%) | 287(4.86%) | 251(4.25%) | 17(0.29%) | 87(1.47%) | 81(1.37%) | 41(0.69%) | 2(0.03%) | 33(0.56%) | 6(0.10%) | 5911(100%) |
con | 2044(20.60%) | 1460(14.71%) | 1445(14.56%) | 1185(11.94%) | 1077(10.85%) | 11(0.11%) | 0(0%) | 1006(10.14%) | 3(0.03%) | 0(0%) | 4(0.04%) | 464(4.68%) | 592(5.97%) | 442(4.45%) | 3(0.03%) | 0(0%) | 2(0.02%) | 120(1.21%) | 2(0.02%) | 0(0%) | 5(0.05%) | 36(0.36%) | 0(0%) | 22(0.22%) | 9923(100%) |
Document location:
summary/3_Cluster_result/sample_cell_cluster_stat.xlsx
Document location:
summary/3_Cluster_result/sample_cell_cluster_stat.png
summary/3_Cluster_result/cluster_sample_stat.png
Document location:
summary/3_Cluster_result/tsne.png
Document location:
summary/3_Cluster_result/tsne.sample_split.png
We used likelihood-ratio test to find differential expression for a single cluster, compared to all other cells. We identified differentially expressed genes as following criteria:
1) p_value ≤ 0.01
2) log2FC ≥ 0.26,log2FC means log2 fold-chage of the average expression between the two groups.
3) The percentage of cells where the gene is detected in specific cluster > 10%.
Cluster | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
AL627309.1 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.00 | 0.01 | 0 | 0 | 0.01 | 0.01 | 0.01 | 0.00 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
CICP27 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0 | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0.00 | 0.00 | 0 | 0.03 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 |
AL627309.6 | 0.00 | 0.03 | 0 | 0.05 | 0 | 0.01 | 0.07 | 0 | 0.01 | 0.17 | 0.00 | 0 | 0 | 0.00 | 0 | 0.01 | 0 | 0 | 0.03 | 0 | 0 | 0 | 0.01 | 0 |
AL627309.7 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
AL627309.5 | 0.01 | 0.13 | 0.00 | 0.16 | 0.01 | 0.01 | 0.04 | 0.03 | 0.01 | 0.06 | 0.01 | 0.02 | 0.00 | 0.07 | 0.01 | 0 | 0.04 | 0.04 | 0.02 | 0.02 | 0.02 | 0.01 | 0 | 0 |
AL627309.4 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
WASH9P | 0 | 0 | 0 | 0 | 0 | 0.00 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 |
AP006222.1 | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.01 | 0.00 | 0.01 | 0 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
AL732372.2 | 0.00 | 0.01 | 0.01 | 0.02 | 0.00 | 0 | 0.01 | 0.00 | 0.00 | 0.01 | 0 | 0.01 | 0.00 | 0.01 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0.04 | 0 | 0 | 0 |
AL732372.3 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Document location:
summary/4_Markergene_result/all_cluster_gene_avg_exp.xlsx
Target_Cluster | GeneID | GeneName | Target_Cluster_mean | Other_Cluster_mean | log2FC | Pvlaue | Qvalue | Description | GO | KEGG | KO_ENTRY | EC |
0 | ENSG00000168685 | IL7R | 12.52 | 6.24 | 1.01 | 0 | 0 | interleukin 7 receptor [Source:HGNC Symbol;Acc:HGNC:6024] | GO:0000018(regulation of DNA recombination);GO:0000902(cell morphogenesis);GO:0001915(negative regulation of T cell mediated cytotoxicity);GO:0002377(immunoglobulin production);GO:0003823(antigen binding);GO:0004896(cytokine receptor activity);GO:0004917(interleukin-7 receptor activity);GO:0005515(protein binding);GO:0005576(extracellular region);GO:0005886(plasma membrane);GO:0006955(immune response);GO:0007165(signal transduction);GO:0007166(cell surface receptor signaling pathway);GO:0008284(positive regulation of cell population proliferation);GO:0008361(regulation of cell size);GO:0009897(external side of plasma membrane);GO:0010628(positive regulation of gene expression);GO:0016020(membrane);GO:0016021(integral component of membrane);GO:0019221(cytokine-mediated signaling pathway);GO:0030217(T cell differentiation);GO:0030665(clathrin-coated vesicle membrane);GO:0033089(positive regulation of T cell differentiation in thymus);GO:0038111(interleukin-7-mediated signaling pathway);GO:0042100(B cell proliferation);GO:0048535(lymph node development);GO:0048872(homeostasis of number of cells);GO:0061024(membrane organization);GO:0070233(negative regulation of T cell apoptotic process);GO:1904894(positive regulation of receptor signaling pathway via STAT) | 04060(Cytokine-cytokine receptor interaction);04068(FoxO signaling pathway);04151(PI3K-Akt signaling pathway);04630(Jak-STAT signaling pathway);04640(Hematopoietic cell lineage);05200(Pathways in cancer);05340(Primary immunodeficiency) | K05072 | NA |
0 | ENSG00000227507 | LTB | 16.19 | 8.63 | 0.91 | 0 | 0 | lymphotoxin beta [Source:HGNC Symbol;Acc:HGNC:6711] | GO:0005102(signaling receptor binding);GO:0005125(cytokine activity);GO:0005164(tumor necrosis factor receptor binding);GO:0005515(protein binding);GO:0005575(cellular_component);GO:0005615(extracellular space);GO:0005886(plasma membrane);GO:0006955(immune response);GO:0007165(signal transduction);GO:0007267(cell-cell signaling);GO:0010467(gene expression);GO:0010469(regulation of signaling receptor activity);GO:0016020(membrane);GO:0016021(integral component of membrane);GO:0033209(tumor necrosis factor-mediated signaling pathway);GO:0043588(skin development);GO:0045084(positive regulation of interleukin-12 biosynthetic process);GO:0048535(lymph node development) | 04060(Cytokine-cytokine receptor interaction);04064(NF-kappa B signaling pathway);05323(Rheumatoid arthritis) | K03157 | NA |
0 | ENSG00000277734 | TRAC | 8.53 | 4.77 | 0.84 | 0 | 0 | T cell receptor alpha constant [Source:HGNC Symbol;Acc:HGNC:12029] | NA | NA | NA | NA |
0 | ENSG00000142541 | RPL13A | 39.76 | 23.28 | 0.77 | 0 | 0 | ribosomal protein L13a [Source:HGNC Symbol;Acc:HGNC:10304] | GO:0000184(nuclear-transcribed mRNA catabolic process, nonsense-mediated decay);GO:0003723(RNA binding);GO:0003729(mRNA binding);GO:0003735(structural constituent of ribosome);GO:0005634(nucleus);GO:0005730(nucleolus);GO:0005737(cytoplasm);GO:0005829(cytosol);GO:0005840(ribosome);GO:0005925(focal adhesion);GO:0006412(translation);GO:0006413(translational initiation);GO:0006417(regulation of translation);GO:0006614(SRP-dependent cotranslational protein targeting to membrane);GO:0015934(large ribosomal subunit);GO:0016020(membrane);GO:0017148(negative regulation of translation);GO:0019083(viral transcription);GO:0022625(cytosolic large ribosomal subunit);GO:0071346(cellular response to interferon-gamma);GO:0097452(GAIT complex);GO:1901194(negative regulation of formation of translation preinitiation complex);GO:1990904(ribonucleoprotein complex) | 03010(Ribosome) | K02872 | NA |
0 | ENSG00000127152 | BCL11B | 4.75 | 2.82 | 0.75 | 0 | 0 | BCL11B, BAF complex component [Source:HGNC Symbol;Acc:HGNC:13222] | GO:0000978(RNA polymerase II proximal promoter sequence-specific DNA binding);GO:0000981(DNA-binding transcription factor activity, RNA polymerase II-specific);GO:0001228(DNA-binding transcription activator activity, RNA polymerase II-specific);GO:0003334(keratinocyte development);GO:0003382(epithelial cell morphogenesis);GO:0003676(nucleic acid binding);GO:0003700(DNA-binding transcription factor activity);GO:0005515(protein binding);GO:0005634(nucleus);GO:0006355(regulation of transcription, DNA-templated);GO:0007409(axonogenesis);GO:0008285(negative regulation of cell population proliferation);GO:0009791(post-embryonic development);GO:0010468(regulation of gene expression);GO:0010837(regulation of keratinocyte proliferation);GO:0019216(regulation of lipid metabolic process);GO:0021773(striatal medium spiny neuron differentiation);GO:0021902(commitment of neuronal cell to specific neuron type in forebrain);GO:0021953(central nervous system neuron differentiation);GO:0031077(post-embryonic camera-type eye development);GO:0033077(T cell differentiation in thymus);GO:0033153(T cell receptor V(D)J recombination);GO:0035701(hematopoietic stem cell migration);GO:0042475(odontogenesis of dentin-containing tooth);GO:0043005(neuron projection);GO:0043066(negative regulation of apoptotic process);GO:0043368(positive T cell selection);GO:0043565(sequence-specific DNA binding);GO:0043588(skin development);GO:0045664(regulation of neuron differentiation);GO:0045944(positive regulation of transcription by RNA polymerase II);GO:0046632(alpha-beta T cell differentiation);GO:0046872(metal ion binding);GO:0048538(thymus development);GO:0071678(olfactory bulb axon guidance);GO:0097535(lymphoid lineage cell migration into thymus) | 05202(Transcriptional misregulation in cancer) | K22046 | NA |
0 | ENSG00000167286 | CD3D | 5.87 | 3.55 | 0.73 | 0 | 0 | CD3d molecule [Source:HGNC Symbol;Acc:HGNC:1673] | GO:0002250(adaptive immune response);GO:0002376(immune system process);GO:0003713(transcription coactivator activity);GO:0004888(transmembrane signaling receptor activity);GO:0005737(cytoplasm);GO:0005886(plasma membrane);GO:0007166(cell surface receptor signaling pathway);GO:0009897(external side of plasma membrane);GO:0016020(membrane);GO:0016021(integral component of membrane);GO:0030217(T cell differentiation);GO:0030665(clathrin-coated vesicle membrane);GO:0042101(T cell receptor complex);GO:0042105(alpha-beta T cell receptor complex);GO:0042803(protein homodimerization activity);GO:0045059(positive thymic T cell selection);GO:0045944(positive regulation of transcription by RNA polymerase II);GO:0046982(protein heterodimerization activity);GO:0050776(regulation of immune response);GO:0050852(T cell receptor signaling pathway);GO:0051260(protein homooligomerization);GO:0061024(membrane organization) | 04640(Hematopoietic cell lineage);04658(Th1 and Th2 cell differentiation);04659(Th17 cell differentiation);04660(T cell receptor signaling pathway);05142(Chagas disease (American trypanosomiasis));05162(Measles);05166(Human T-cell leukemia virus 1 infection);05169(Epstein-Barr virus infection);05170(Human immunodeficiency virus 1 infection);05340(Primary immunodeficiency) | NA | NA |
0 | ENSG00000008988 | RPS20 | 16.86 | 10.27 | 0.72 | 0 | 0 | ribosomal protein S20 [Source:HGNC Symbol;Acc:HGNC:10405] | GO:0000184(nuclear-transcribed mRNA catabolic process, nonsense-mediated decay);GO:0003723(RNA binding);GO:0003735(structural constituent of ribosome);GO:0005515(protein binding);GO:0005622(intracellular);GO:0005654(nucleoplasm);GO:0005737(cytoplasm);GO:0005829(cytosol);GO:0005840(ribosome);GO:0006412(translation);GO:0006413(translational initiation);GO:0006614(SRP-dependent cotranslational protein targeting to membrane);GO:0015935(small ribosomal subunit);GO:0016020(membrane);GO:0019083(viral transcription);GO:0022627(cytosolic small ribosomal subunit);GO:0070062(extracellular exosome) | 03010(Ribosome) | K02969 | NA |
0 | ENSG00000129824 | RPS4Y1 | 8.00 | 4.95 | 0.69 | 0 | 0 | ribosomal protein S4 Y-linked 1 [Source:HGNC Symbol;Acc:HGNC:10425] | GO:0000184(nuclear-transcribed mRNA catabolic process, nonsense-mediated decay);GO:0003723(RNA binding);GO:0003735(structural constituent of ribosome);GO:0005622(intracellular);GO:0005634(nucleus);GO:0005654(nucleoplasm);GO:0005829(cytosol);GO:0005840(ribosome);GO:0005844(polysome);GO:0006412(translation);GO:0006413(translational initiation);GO:0006614(SRP-dependent cotranslational protein targeting to membrane);GO:0007275(multicellular organism development);GO:0016020(membrane);GO:0019083(viral transcription);GO:0019843(rRNA binding);GO:0022627(cytosolic small ribosomal subunit) | 03010(Ribosome) | K02987 | NA |
0 | ENSG00000179144 | GIMAP7 | 5.42 | 3.38 | 0.68 | 0 | 0 | GTPase, IMAP family member 7 [Source:HGNC Symbol;Acc:HGNC:22404] | GO:0000166(nucleotide binding);GO:0003924(GTPase activity);GO:0005515(protein binding);GO:0005525(GTP binding);GO:0005737(cytoplasm);GO:0005783(endoplasmic reticulum);GO:0005794(Golgi apparatus);GO:0005811(lipid droplet);GO:0005829(cytosol);GO:0042802(identical protein binding);GO:0042803(protein homodimerization activity);GO:0043231(intracellular membrane-bounded organelle);GO:0046039(GTP metabolic process) | NA | NA | NA |
0 | ENSG00000160654 | CD3G | 3.93 | 2.46 | 0.68 | 0 | 0 | CD3g molecule [Source:HGNC Symbol;Acc:HGNC:1675] | GO:0002250(adaptive immune response);GO:0002376(immune system process);GO:0004888(transmembrane signaling receptor activity);GO:0005886(plasma membrane);GO:0005887(integral component of plasma membrane);GO:0007163(establishment or maintenance of cell polarity);GO:0007166(cell surface receptor signaling pathway);GO:0009897(external side of plasma membrane);GO:0015031(protein transport);GO:0016020(membrane);GO:0016021(integral component of membrane);GO:0030159(receptor signaling complex scaffold activity);GO:0030217(T cell differentiation);GO:0030665(clathrin-coated vesicle membrane);GO:0038096(Fc-gamma receptor signaling pathway involved in phagocytosis);GO:0042101(T cell receptor complex);GO:0042105(alpha-beta T cell receptor complex);GO:0042110(T cell activation);GO:0042608(T cell receptor binding);GO:0042803(protein homodimerization activity);GO:0045059(positive thymic T cell selection);GO:0046982(protein heterodimerization activity);GO:0050776(regulation of immune response);GO:0050852(T cell receptor signaling pathway);GO:0051260(protein homooligomerization);GO:0061024(membrane organization);GO:0065003(protein-containing complex assembly);GO:0070228(regulation of lymphocyte apoptotic process) | 04640(Hematopoietic cell lineage);04658(Th1 and Th2 cell differentiation);04659(Th17 cell differentiation);04660(T cell receptor signaling pathway);05142(Chagas disease (American trypanosomiasis));05162(Measles);05166(Human T-cell leukemia virus 1 infection);05169(Epstein-Barr virus infection);05170(Human immunodeficiency virus 1 infection) | NA | NA |
Document location:
summary/4_Markergene_result/Markergene_list.xlsx
Clusters | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
UP_gene_number | 196 | 662 | 207 | 655 | 181 | 254 | 752 | 370 | 777 | 772 | 293 | 281 | 544 | 575 | 408 | 383 | 466 | 771 | 728 | 503 | 419 | 1013 | 240 | 634 |
Document location:
summary/4_Markergene_result/Markergene_stat.xlsx
Document location:
summary/4_Markergene_result/Markergene_stat.png
Then the expression distribution of the top 10 marker genes were demonstrated by using heat map and bubble diagram et al.
Document location:
summary/4_Markergene_result/all_cluster_markers_heatmap.png
Document location:
summary/4_Markergene_result/top10_marker_plot/cluster0_top10_marker.png
Document location:
summary/4_Markergene_result/top10_marker_plot/cluster0_top10_marker_tsne.png
Document location:
summary/4_Markergene_result/top10_marker_plot/cluster0_top10_marker_dotplot.png
Document location:
summary/4_Markergene_result/top10_marker_plot/cluster0_top10_marker_RidgePlot.png
Gene Ontology (GO) is an international standardized gene functional classification system which offers a dynamic-updated controlled vocabulary and a strictly defined concept to comprehensively describe properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component and biological process. The basic unit of GO is GO-term. Each GO term belongs to a type of ontology.
GO enrichment analysis provides all GO terms that significantly enriched in differentially expressed genes comparing to the genome background, and filter the differentially expressed genes that correspond to biological functions. Firstly all peak related genes were mapped to GO terms in the Gene Ontology database (http://www.geneontology.org/), gene numbers were calculated for every term, significantly enriched GO terms in differentially expressed genes comparing to the genome background were defined by hypergeometric test. The calculating formula of P-value is:
Here N is the number of all genes with GO annotation; n is the number of differentially expressed genes in N; M is the number of all genes that are annotated to the certain GO terms; m is the number of differentially expressed genes in M. The calculated p-value ≤ 0.05 was set as a threshold. GO terms meeting this condition were defined as significantly enriched GO terms in differentially expressed genes. This analysis was able to recognize the main biological functions that differentially expressed genes exercise.
Document location:
summary/5_Enrichment_result/1_GO_Enrichment
Genes usually interact with each other to play roles in certain biological functions. Pathway-based analysis helps to further understand genes biological functions. KEGG is the major public pathway-related database. Pathway enrichment analysis identified significantly enriched metabolic pathways or signal transduction pathways in differentially expressed genes comparing with the whole genome background. The calculating formula is the same as that in GO analysis:
Here N is the number of all transcripts that with KEGG annotation, n is the number of differentially expressed genes in N, M is the number of all transcripts annotated to specific pathways, and m is number of differentially expressed genes in M. The calculated p-value ≤ 0.05 was set as a threshold. Pathways meeting this condition were defined as significantly enriched pathways in differentially expressed genes.
Document location:
summary/5_Enrichment_result/2_KEGG_Enrichment
SingleR performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.
Document location:
summary/6_SingleR_result/cell_annot_HPCA_label.main_cell_pie.png
Document location:
summary/6_SingleR_result/cell_annot_HPCA_label.main_cell_tsne.png
Notice:
The cell types which included in the HPCA database of SingleR were limited. Some other tools/methods (such as marker gene database-based, correlation-based, supervised classification-based) were suggested to employ in order to identify the tissue specific- or subcellular- cell types.
[1] CellRanger:http://support.10xgenomics.com/single-cell/software/overview/welcome.
[2] Seurat:Butler A , Hoffman P , Smibert P , et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species[J]. Nature Biotechnology, 2018.
[3] bimod:Mcdavid A , Finak G , Chattopadyay P K , et al. Data Exploration, Quality Control and Testing in Single-Cell qPCR-Based Gene Expression Experiments[J]. Bioinformatics, 2013, 29(4):461-467.
[4] t-SNE:Maaten L V D, Hinton G. Visualizing Data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(2605):2579-2605.
[5] Kanehisa, M., M. Araki, et al. (2008). KEGG for linking genomes to life and the environment. Nucleic acids research.(KEGG)
[6] Zheng G X, Terry J M, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells[J]. Nature Communications, 2017, 8:14049.
[7] Cannoodt R, Saelens W, Saeys Y. Computational methods for trajectory inference from single‐cell transcriptomics[J]. European Journal of Immunology, 2016, 46(11):2496-2506.
Address:2575 West Bellfort Street, Suite 270, Houston, TX, 77054 USA
Website:http://www.lcsciences.com/
Email: support@lcsciences.com
Tel: (713) 664-7087