Validation of a large cell scRNA-seq method
We utilized a large particle FACS (LP-FACS) method testing two different approaches to isolated and dissociated FDB myofibers. In the first approach, we dissected the FDB from tendon to tendon prior to digestion, enabling isolation of fully intact myofibers. In the second approach, we cut small portions of the FDB muscle using scissors. We reasoned that the latter approach would broadly mimic skeletal muscle needle biopsies as might be done, for example, from a human patient sample. We isolated single myofibers through LP-FACS, using a flow channel size of 500 μm. The COPAS SELECT Flow Pilot Platform was employed. Using time-of-flight (TOF, measuring axial length) and optical extinction (EXT, measuring optical density) parameters, we found that skeletal myofibers separated into three populations—an EXT-low population, EXT-high/TOF-low population, and EXT-high/TOF-high population (Fig. 1a). The EXT-high/TOF-high population was comprised almost entirely of intact myofibers with lengths > 400 μm, suggesting successful sorting of large myofibers (Fig. 1b). Interestingly, the EXT-high/TOF-low population was composed of what appeared to be rod-shaped fragments that maintained sarcomeric proteins, albeit disrupted (Fig. 1c). The EXT-low population was comprised mostly of debris and dead cells, as previously observed with cardiac myocytes (Fig. 1d). The EXT-high/TOF-low population qualitatively resembled our pseudo-biopsy isolated myofiber fragments (Fig. 1e), which also shared similar TOF and EXT parameters (not shown). To our knowledge, this is the first FACS-based single cell RNA-seq study of skeletal myofibers; thus, we adopted a broad gating strategy for isolation of single cells. We sorted 700 EXT-high myofibers (comprised of both TOF-high and TOF-low populations) as well as 100 myofiber fragments isolated through the pseudo-biopsy method.
The two gates and pseudo-biopsy approach were used to isolate 763 cells for single cell RNA-seq using the established mcSCRB-seq protocol [21, 22]. The entire group of 763 cells/cell fragments were sequenced to a median depth of 108,110 reads per cell. Preliminary analyses, however, indicated a distinct cluster of cells with a high percentage of mitochondrial reads (Fig. 1f) or otherwise low abundance reads (median 12,187 per cell). Notably, almost all of our pseudo-biopsy myofiber fragments and many TOF-low cells fell into this category. These quality control metrics likely indicated poor quality or sheared cells with loss of RNA. Thus, we excluded these cells, identified the EXT-high and TOF-high gate as the appropriate gate to obtain high quality myofibers, and narrowed our analysis to the best 171 cells (>5000 genes expressed and <20% mitochondrial genes) remaining with a median read count of 239,572 per cell.
Analysis of the expression patterns of single FDB myofibers
A median of 12,187 transcripts were identified in these myofibers and all had the expression patterns of mature skeletal myofibers, highly expressing a myosin heavy chain isoform.
We used the top 4 significant PCs to cluster these cell types (Fig. 2a). Three groups were observed in a UMAP dimensionality reduction plot. Two clusters, containing 69 and 53 cells respectively (71% of all cells) had elevated expression of Myh1 and Myh8 identifying these groups as fast 2X type cells. MYH8, while considered a neonatal myosin, maintains low expression in adult skeletal muscle [32, 33]. A third cluster containing 49 cells was defined by high expression of Tnnt1 and Myh2. A deeper analysis of this group showed that 12 cells had high to modestly elevated Myh7 expression (a slow-twitch marker), indicating this cluster was a combination of slow-fibers cells and fast 2A fibers (Fig. 2b, c). Of note, Myh4, a myosin heavy chain associated with fiber type 2B, was the dominant myosin in only a single cell that was assigned to this group (Fig. 2b, c) [2]. As the FDB is a fast-fiber muscle, the overall distribution of significantly more fast (159) to slow fibers (12) is consistent with expected.
Interestingly, the expression patterns of the main fast-/slow-fiber differentiating Myh genes was not as dichotomous as noted in protein based fiber type data [34]. Here there were many more cells with intermediate levels and coexpression of Myh1 and Myh2 suggesting higher gene plasticity and more cell hybrids (Fig. 2b) [2].
Shared and variable transcripts by cell type
We wondered about the extent to which highly abundant genes were mosaic across these cell fiber states. By normalized read counts of the scRNA-seq data, we determined the 50 most abundant transcripts by the average of each cell type in the two fast 2X clusters and the one fast 2A / slow cluster determined by Seurat (Suppl. Table 1). The overall most abundant transcripts were Ttn, Acta1, and mt-Rnr2. There was significant overlap of abundant genes, with only 9 genes being different across the three samples. We then explored differences specifically between the two most abundant fiber types, fast 2A and fast 2X. Of 2649 evaluated genes (all expressed in ≥95% of cells of one cluster), 160 genes were differentially expressed across the two groups (t-test, adj. p value <0.01). This included expected genes such as Tnni1, Tnnt1, and Myh1 and less investigated genes such as Ubash3b and Togaram2 (Fig. 2d, Suppl. Table 2).
We validated a subset of these genes (Eno3, Fhl1, Got2, Myh2) using RISH and available probes across the extensor digitorum longus (EDL), gastrocnemius, and soleus. Eno3 is a known fast fiber gene (both 2A and 2X) and was identified in most cells of the fast-twitch EDL and gastrocnemius (Fig. 3a, b). Fhl1 was identified as being elevated in fast 2A myofibers (Fig. 2d, Suppl. Table 2). In Fhl1-positive myofibers, Eno3 qualitative expression was reduced, but not absent. In the slow-twitch soleus (Fig. 3c), levels of both genes were decreased. Of 693 myofibers reviewed across all of the tissues, most (341, 49%) showed co-expression, with 186 cells being Eno3+ only, 92 being Fhl1+ only and 74 having no expression. A χ2 analysis demonstrated only a modest enrichment for co-expression (χ2=4.25, p = 0.039). Fhl1 was then compared to Myh2, a known fast 2A gene (Fig. 3d–f). The strong pre-mRNA Myh2 staining was interpreted as nuclear [15, 16]. The expression of the two genes demonstrated appropriate overlap in the same cells (206 co-expressed, 195 non-expressed, and 30 counter-expressed, χ2 = 320.9, p = 9.3e-72). Got2, also identified as elevated in fast 2A fibers, showed appropriate co-expression with Myh2 across all three tissues (Fig. 3g–i) and by myofiber (81 co-expressed, 69 non-expressed, 22 counter-expressed, χ2 = 98.1, p = 4.0e−23). These patterns of fast fiber expression are consistent with those identified by scRNA-seq (Fig. 2d).
Comparison of full cell scRNA-Seq to nuclear RNA-Seq
A recent publication by Dos Santos et al. [16] described nuc-seq of mouse skeletal muscles from a mixed sample of tibialis, gastrocnemius, soleus, plantaris, and extensor digitorum longus (N=6 each), along with nuclei from each of quadriceps, tibialis, and soleus, identifying myonuclei based on Ttn expression. Whereas we focused on sequencing depth (239,572 median reads/171 cells), Dos Santos et al. went wide, obtaining many more skeletal myofiber nuclei (6962), but only to a median read count of 2785 and 1210 transcripts per nucleus in their mixed muscle sample. We processed this dataset using Seurat and determined, as they reported, the presence of slow, fast 2A, fast2A/2X, fast 2B, and fast 2X nuclei clustering more distinctly by myosin heavy chain status on a UMAP visualization of the data, than our whole scRNA-seq data (Suppl. Fig. 1).
As the whole cell versus nuclear isolation methods were so distinct, we evaluated how those differences affect the presence of abundant genes. Notably, in a comparison of the most highly expressed genes, only 27 were present in the top 100 for both methods. A GO search of the 73 genes that were only abundant in the whole cell scRNA-seq showed these genes were enriched for terms such as “myofibril” and “ATP metabolic process.” This had us wonder if we could observe differences in gene classes based on the nuc-seq vs scRNA-seq methods similar to that described in other cell types [35]. We used normalized expression data between the studies and determined the expression differences between whole cell and nuclear data for the genes representing the GO terms of transcription factors, cell cycle genes, mitochondria, and actin-cytoskeleton. Both transcription factors and cell cycle genes were more abundant in the nuc-seq data (3.9 and 1.9 fold respectively), while actin-cytoskeleton genes were more abundant in scRNA-seq data (1.9 fold). Mitochondrial genes were equivalent for expression across the two methods. However, two of the most abundant scRNA-seq genes overall, the mitochondrial genes mt-Rnr1 and mt-Rnr2, were not present in the nuc-seq data, impacting these results. Also of note, Malat1, a known nuclear lncRNA, was the most abundant transcript in the nuc-seq data, consistent with a prior report [35].
We compared a combined slow/fast 2A group against a fast 2X group to again discern differences in myofiber-specific genes by sequencing method. There were 771 differentially expressed genes (t-test, adj. p value <0.01) in comparison to 152 in the whole cell dataset. Only 80 differentially expressed genes overlapped between the two datasets and of these, three were significant in opposite fold directions (Crim1, Myh4, Kcnc1) (Suppl. Table 3). Altogether, these data indicate the ability to discern cell myofiber types, by either nuc-seq or scRNA-seq, despite differences in the specific genes that discriminate across the slow/fast 2A and fast 2X cells and the relative expression levels of genes, by the two methods of RNA-seq.
Is there a meaningful difference between fast 2X subclusters?
The initial Seurat analysis subsetted the fast 2X cluster into two groups. We explored if these two fast 2X clusters (cluster 1 - 2Xc1; cluster 2 – 2Xc2) represent unique cell types, cell states, or some technical division. Of 5260 genes compared, 557 genes were differentially expressed (t test; adj. p value <0.01). A GO analysis on the 557 genes identified an enrichment of the cellular component “neuronal synapse,” suggesting variability at the NMJ. A further review of the top significant genes showed that >20 genes appear to have neuronal origins (Cdh4, Cdkl5, Cntn4, Dscam, Gabbr2, Kirrel3, Lingo2, Lrp1, L1cam, Nrcam, Ntn1, Ntrk3, Ptprt, Ptpro, Robo2, Sdk1, Sema5a, Sema6d, Shank2, Sox5, Tnr, and Wwox). We attempted to exclude technical reasons for this variability before investigating a biological rationale for the division.
First, we noted that the vast majority of the neuronal genes (20/22) were present in at least 120 of the 171 cells (Suppl Fig. 2).We surmised that some degree of ambient RNA was present [36]. We then performed RISH for two of these genes, Gabbr2 and Ntrk3, showing robust neuronal staining (Fig. 3j) and some Gabbr2, but no Ntrk3 in myofibers (Fig. 3k). We further noted Pecam1 and Smtn, as markers of endothelial cells and smooth muscle cells respectively, showed comparable increases in these genes among the fast 2Xc2 cells. These data indicated that despite extremely low expression, ambient genes, in general, have slightly elevated values in fast 2Xc2 cells, perhaps consistent with more genes being detected in these cells. We further note that three of the most abundant genes, Ttn, mt-Rnr1, and mt-Rnr2 are conversely lower in fast 2Xc2 cells (Suppl Fig. 2). We take the summation of this data to indicate that the separation of fast 2Xc1 and 2Xc2 is a technical artifact related to the sequencing and not a true biological distinction.