Beginner’s Guide to Understanding Single-Cell ATAC-Seq
September 30, 2020
The transposase-accessible chromatin using sequencing (ATAC-Seq) assay has been widely adopted by the scientific community since its development by Jason Buenrostro and colleagues in the Greenleaf lab in 2013, and it’s now one of the most popular approaches to investigate epigenetic profiles.
The ATAC-Seq assay enables the identification of open chromatin regions, which are generally transcriptionally active genes, from low numbers of cells. Transcription factor binding sites and positions of nucleosomes can also be identified from the analysis of ATAC-Seq data, potentially allowing important genetic pathways in the samples to be elucidated.
Over the last decade, the emergence of single-cell technologies has opened the doors to new discoveries in heterogeneous tissues and other complex sample types. However, the first single-cell technology was developed in 1972 with the giemsa staining of chromosomes to determine karyotype. Despite the low resolution of this technique (50-100 megabases), identification of karyotypes allowed the detection of chromosomal amplifications, deletions, and transpositions at a genome-wide level in individual cells and became widely adopted for use in diagnostics, in particular for leukemias.
With the development of next-generation sequencing (NGS), genome-wide sequence analysis at single-nucleotide resolution has now become mainstream. NGS technology has allowed the rise of the barcoding-based single-cell sequencing technologies that have been adapted to RNA-Seq, DNA-Seq, and ATAC-Seq.
In this article, we focus on the single-cell ATAC-Seq (scATAC-Seq) technology to describe how it works and highlight the benefits and the drawbacks of scATAC-Seq relative to bulk ATAC-Seq.
What is ATAC-Seq?
The ATAC-Seq method is a genome-wide NGS-based assay that characterizes chromatin states in cell and tissue samples. In particular, ATAC-Seq is used to identify regions of the genome that have open chromatin states that are generally associated with sites undergoing active transcription.
The sequencing of open chromatin regions that are being transcribed can lead to the identification of transcription factors that are active in the phenotype or conditions being investigated. ATAC-Seq assays are also powerful because they can be used to determine nucleosome positioning.
The ATAC-Seq protocol shares some similarities with DNase-Seq and FAIRE-Seq assays, but offers several advantages such as a more user-friendly protocol and compatibility with lower numbers of cells.
ATAC-Seq has become a common first step into epigenomic analysis that opens a lot of hypotheses about the molecular mechanisms responsible for regulating many different cellular processes. The initial findings from ATAC-Seq assays can be validated and extended by performing other techniques, such as reporter assays, chromatin immunoprecipitations, and DNA methylation assays.
The ATAC-Seq Protocol
The first step in the ATAC-Seq protocol is the incubation of samples with the prokaryotic transposase enzyme Tn5, which can only access open chromatin regions. The Tn5 enzyme used in ATAC-Seq is “loaded” with NGS adapters, creating what is referred to as an assembled transposome, allowing the enzyme to simultaneously perform both digestion and library preparation reactions in a process called tagmentation. After amplification, the library can be sequenced.
ATAC-Seq displays several advantages as compared to other techniques focusing on chromatin state, including ChIP-Seq. Because of the straightforward processing (only 2 steps: tagmentation and amplification), ATAC-Seq is compatible with low amounts of starting material (~50,000 cells). This allows the analysis of precious samples, such as patient tumors or primary cells. Furthermore, the samples don’t need to be fixed for ATAC-Seq, so the native, biologically relevant chromatin state is analyzed. This is in contrast to ChIP reactions, which generally require that the samples be crosslinked. Finally, ATAC-Seq does not require the use of any antibodies that could introduce variability in the experiment due to lack of specificity or that might simply not work well in fixed conditions.
Two factors have to be taken into account when performing ATAC-Seq. The first is the number of cells. Using too few or too many cells can result in incomplete digestion or over digestion. The second factor that must be considered is cell death. Apoptosis can result in the presence of degraded cell-free DNA that can undergo tagmentation by the Tn5 enzyme, which would produce false sequencing signals. This problem can be avoided by DNase treatment before tagmentation.
The Rise of Single-Cell ATAC-Seq Assays
Bulk ATAC-Seq, despite all the advantages, and like other genome-wide technologies, cannot determine the chromatin states of individual subpopulations of cells within a sample. To identify open chromatin in heterogeneous populations, such as blood, pancreas, and brain, ATAC-Seq analysis has to be performed at a single-cell level.
In 2015, two groups published their results on adapting ATAC-Seq for single-cell studies at around the same time. The Greenleaf lab published their paper in Nature, and the Shendure lab published theirs in Science.
They developed two different strategies. In Greenleaf’s approach, individual cells are captured by microfluidic chambers, then undergo tagmentation, followed by library amplification, when cell-identifying barcoded primers are added. They tested this protocol in different cell lines starting with GM12878 lymphoblastoid cells. The scATAC-Seq data from several individual chambers recapitulated the bulk ATAC-Seq and DNase-Seq data. Chromatin accessibility changes were associated with specific trans-factors and cis-elements, as well as trans-elements. Similar results were obtained in H1 human embryonic stem cells, K562 chronic myelogenous, V6.5 mouse ESCs, EML1, TF-1, HL-60, and BJ fibroblasts.
The protocol from Shendure’s lab is based on a 2-step cellular indexing strategy and is much more time-consuming. First, 2,500 nuclei are loaded in a 96-well plate and go under tagmentation using the Tn5 loaded with unique adaptors. After obtaining these first pools with distinct barcodes, the nuclei are FACS-sorted and loaded into a new micro-plate, lysed, and DNA is amplified and barcodes are added. The low cell concentration in each well ensures single-cell barcoding during amplification. They first tested this approach with an equal mix of GM12878 human nuclei and Patski mouse nuclei and extended to other cell lines, including human HEK293T and HL-60 nuclei mixed with GM12878. They were able to discriminate the chromatin state between the different cell lines and within the cell lines with a collision rate of 11%.
Since then, several homemade protocols and improvements of the technology have been published and two companies, 10x Genomics and Bio-Rad, offer specific devices and reagents to perform scATAC-Seq assays.
How Do Single-Cell ATAC-Seq Protocols Work?
Single-cell ATAC-Seq protocols generally rely on performing ATAC-Seq reactions using the Tn5 transposase loaded with the sequencing adapters followed by single-cell labeling and amplification of the library.
The first and most crucial step in scATAC-Seq assays is the isolation of nuclei. To isolate nuclei, the plasma membrane needs to be degraded while maintaining the nuclear membrane intact. Different protocols have been published from fresh/frozen cells and tissue.
In 2017, Ryan Corces et al. published a detailed protocol in Nature Methods to isolate nuclei from frozen tissue to use in different epigenomic assays, including scATAC-Seq. This protocol is based on Dounce homogenization and density gradient centrifugation. This protocol is based on a lysis buffer containing digitonin (to permeabilize the plasma membrane), non-ionic detergent (to maintain native interactions), and centrifugation to remove the cytoplasmic fraction.
Regardless of the protocol followed, some basic points must be considered for best results. The quality and viability of the sample at the time of freezing, as well as at thawing, is crucial to avoid cell/tissue death. The sample viability needs to be >80% to avoid sequencing noise due to the tagmentation of cell-free DNA from the dead cells. Furthermore, the process must be performed at 4°C to maintain the cells in their “native” state. In Corces’s protocol, the authors recommend cooling down all the material including pestles and tubes.
The published protocols are a helpful starting point, but lysis time, centrifugation speed/time, and filtration steps will have to be optimized for every cell/tissue type. 10x Genomics also provides protocols adaptable to fresh/frozen cells, tissues, and low input samples.
Tagmentation and Library Preparation
After the nuclei are isolated, tagmentation can be performed. This step is similar to the tagmentation step in the bulk ATAC-Seq protocol. Tn5 is loaded with sequencing indexes and added to the sample. The Tn5 enters the nuclei (with the nuclear membrane still intact), fragments DNA in open chromatin regions, and adds the indexing adapters. To ensure the best tagmentation success, the ratio nuclei number and Tn5 quantity must be optimized.
The single-cell barcoding step consists of labeling each cell individually by adding unique barcodes to identify each cell during bioinformatic analysis. Depending on the protocol, you can use fluidic devices, microplates, or other systems that allow separating each nucleus individually.
10x Genomics and Bio-Rad developed specific devices based on fluidics to facilitate this step. Briefly, barcodes and individual nuclei are encapsulated either in gel beads or with oil using a specific microfluidic cartridge. Nuclei are delivered at a limiting-dilution so that each capsule contains only one nucleus. Once one nucleus and one barcode are encapsulated, an enzymatic reaction links the barcode to the sequencing indexes.
Once barcodes are linked to DNA fragments, the capsule is dissolved, and the libraries are amplified by PCR prior to sequencing. The library is assessed for quality and fragment length. High-quality libraries have a broad size distribution of ~250–2,000 bp and an average fragment length of 400–800 bp.
The libraries are dual-indexed and can be sequenced with paired-end reads. The recommended sequencing depth is 25,000 read pairs per nucleus.
Several examples of software exist to analyze scATAC-Sequencing data including ChromVar, SCRAT, and SCALE. The pre-processing first consists of demultiplexing the data if multiple samples have been sequenced at the same time. Sequencing adaptors and primers are trimmed, and the genome is aligned to the relevant reference genome. Before further data analysis, the quality of the single-cell sequencing is tested to filtered out barcodes corresponding to low-quality cells or doublets. After peak annotation, regulatory elements (TF motif, TSS) are used to generate a cell-by-feature matrix. Following batch correction and data integration, begins the 2D/3D visualization and clustering of the data. Genome annotation from these clusters can lead to cell identity annotation as well as chromatin accessibility dynamics.
Pros & Cons of scATAC-Seq
ATAC-Seq is an easy way to analyze chromatin state, but bulk ATAC-Seq gives a general overview of the open chromatin without differentiating the cell type/stage. The main benefit of the development of single-cell ATAC-Seq technology is that this method has allowed the identification of open chromatin in heterogeneous or complex tissue and cell samples. Many biological samples, such as tumors and tissues in different developmental states, contain multiple subpopulations of cells that potentially have different epigenomic profiles. scATAC-Seq allows the identification of such cell subpopulations, and thus gives the most accurate description of chromatin state in these dynamic processes.
As compared to bulk ATAC-Seq, single-cell technology requires much more starting material. Another challenging aspect of scATAC-Seq is that minor subpopulations can be hard to detect unless deeper sequencing is performed. Furthermore, the barcoding step requires specific devices and consumables that can be expensive, making scATAC-Seq difficult to perform for some researchers. Finally, bioinformatics analysis is much more complicated since every cell sequencing track needs to be analyzed individually and the different clusters identified. However, 10x Genomics and Bio-Rad offer software dedicated to single-cell sequencing analysis.
Altogether, scATAC-Seq is more time-consuming, requires more sample, and is more expensive than bulk ATAC-Seq, but is more informative and may be required for the most relevant epigenomic analysis of complex sample types.
Discoveries Enabled by scATAC-Seq
Lung Adenocarcinoma and Tumor Progression
Cancer development is due to a succession of genomic mutations that lead to uncontrolled proliferation and cell migration. However, epigenetic dysregulation, as well as environmental factors, also play important roles in cancer progression. These successive alterations that are characteristic of cancer generate a heterogeneous pool of cells at different stages of cancer.
LaFave LM. et al. used a mouse model of lung adenocarcinoma to characterize tumor progression. To investigate the diversity of cells in the tumors, Tyler Jacks’s lab used the single-cell combinatorial indexing (sci) ATAC-Seq with normal lung and lung adenocarcinoma cells from individual tumors, whole-bearing tumor lung, and metastasis. As expected, they found evidence of lineage loss and cell plasticity. Interestingly, in the primary tumors, they find similar chromatin states that are found in metastatic cells, but at low frequency. Whereas primary tumors were rather heterogeneous, metastatic samples showed more homogeneity, suggesting that metastases finally find a stable chromatin state. By combining transcription factor activity, they highlight 11 “co-accessibility” programs corresponding to the different tumor stages involved in the development and lineage identity.
Digit Development in Embryo
HOX transcription factors are widely involved in embryo development. Despite the similar DNA-binding motifs of the different HOX transcription factors, they lead to completely different programs, suggesting a crucial role for the chromatin environment. To understand the dynamics of the HOX family binding, Desanlis I. et al. used a model of developing mouse limb. They focused on HOXA13, HOXD13 in digit formation context, and HOXA11, implicated in the forearm and leg development. When they compared the HOXA11 binding profile in limb buds to the HOX13 profile, they observed that they shared most of the same loci. However, when ectopically expressing HOXA11 distally, HOXA11 bound to HOX13-specific loci, suggesting that their respective specific binding loci are associated with distinct cell population expression. With bulk ATAC-Seq, they compared chromatin accessibility in proximal and distal limb buds and discovered that in the distal region, accessible chromatin prevalently contained the HOX13 binding site.
Finally, they used scATAC-Seq to analyze HOX13-dependant chromatin accessibility in wild type and HOX13 KO mouse forelimb buds. HOX13 absence induced proximal limb bud chromatin profile providing evidence that HOX13 is necessary to the acquisition of distal limb chromatin profile.
Naïve T Cell Heterogeneity
The T cell immune response is tightly regulated by multiple negative checkpoint regulators (NCR). NCRs are expressed on activated T cells, except for V-type immunoglobulin containing suppressor of T cell activation (VISTA), which is also constitutively expressed in naïve T cells.
ElTenbouly M. et al. used a mouse model with VISTA deleted in the CD4+ T cell compartment. scRNA-Seq analysis showed transcriptional phenotype changes as well as heterogeneity in the T cell compartment with the identification of several clusters. One cluster displayed a reduction of naïve T cells proportion when VISTA is knocked out. Another cluster is defined by an overactivation of extracellular matrix interaction signaling associated with an upregulation in the TCR pathway. The last cluster, which is predominant in VISTA KO cells, is characterized by an upregulation of the stem cell memory-like program. Altogether, loss of VISTA induced a reduction of quiescent T cell and altered the naïve T cell repertoire.
To extend the scRNA-Seq analysis, the researchers further analyzed the chromatin state of this compartment using scATAC-Seq. Similar results were obtained with a decrease in quiescent T cell and an increase in memory-phenotype cells in VISTA KO T cells. The combined analysis of the transcriptomic and epigenomics of the T cell compartment, showed consistent results and confirm the role of VISTA in naïve T cell maintenance as well as the heterogeneity of this compartment.
What’s Next for Single-Cell ATAC-Seq Assays?
Single-cell technology adapted to ATAC-Seq or RNA-Seq has already revealed the heterogeneity of genes regulated in subpopulations of cells. So far, scATAC-Seq and scRNA-Seq assays are performed in two distinct samples and the integration of the data is done at the bioinformatic level. New technology is being developed that allows scATAC-Seq and scRNA-Seq to be run simultaneously, where one cell would have the same barcode for RNA and chromatin to generate transcriptomic and epigenomic data at a single-cell level. Single-cell technology also exists for other protocols such as ChIP-Seq, so analyzing the dynamics of histones or transcription factor binding in addition to open chromatin at single-cell resolution should reveal a lot of details about epigenomic regulation.
The investigation of the chromatin state in heterogeneous populations using scATAC-Seq paves the way toward the identification of mechanisms responsible for regulating processes that were previously unidentifiable in bulk ATAC-Seq experiments. Going deeper into transcriptional mechanisms in individual cells will highlight potential therapeutic targets that are currently unknown.
Regarding medical applications, analyzing patient samples at the single-cell level could facilitate the choice of the therapeutic strategy. Indeed, in cancer, the identification of metastatic clusters in the primary tumor could lead to more aggressive therapy before the migration of the metastasis in another organ.
Summary: Single Cell Assays are Taking Epigenomics to the Next Level
The development of the ATAC-Seq method has already contributed a lot to our understanding of epigenetic mechanisms. Single-cell assays have taken the research to a whole new level where the chromatin states can be analyzed in individual cells. These kinds of experiments allow the identification of transcriptionally active genomic regions in complex tissues and processes. The study of tumors, brain, and pancreas tissues with single-cell assays promise surprising new discoveries. In the same way, research in specific dynamic processes such as development and immunity is now able to reach more detailed discoveries than ever before.
The identification of genomic mutations made a leap forward in the past decade thanks to the development of next-generation sequencing technology. A similar advancement in the epigenomic understanding and its importance in healthy processes and diseases is going to be possible in the next decade thanks to the single-cell analysis.
About the author
Anne-Sophie Ay-Berthomieu, Ph.D.
Anne-Sophie was born in the south of France and grew up between the Mediterranean Sea and the Pyrenean Mountains. She grew up as a science fiction fan, leading her to specialize in molecular biology and genetics during graduate school at the University of Lyon, France (secretly hoping her research would give her superpowers!). After living in different places for work, she is back in Lyon, France where she shares her time between her husband, her family, and her friends. During her free time, Anne-Sophie challenges herself with hiking, climbing, racing, and traveling in foreign countries – while waiting for her superpowers to grow!
Contact Anne-Sophie on LinkedIn with any questions, or to tell her about your superpowers.