GSE155649 Processing Pipeline
Publication
Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.Nature methods (2021) — PMID 33963355
Dataset
GSE155649Robust single-cell discovery of RNA targets of RNA binding proteins and ribosomes [RNA-seq]
Processing Steps
Generate Jupyter Notebook-
1
Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
$ Bash example
cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
-
2
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
$ Bash example
STAR --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
-
3
Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Placeholder for input unmapped reads # Replace 'unmapped_reads.fastq.gz' with your actual input file INPUT_FASTQ="unmapped_reads.fastq.gz" # Placeholder for STAR hg19 genome index directory # Replace '/path/to/hg19_star_index' with the actual path to your hg19 STAR index # If the index does not exist, you would need to generate it first: # STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/hg19_star_index --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf GENOME_INDEX_DIR="/path/to/hg19_star_index" # Output prefix for mapped files OUTPUT_PREFIX="genome_mapped_" # Number of threads to use NUM_THREADS=8 STAR --runThreadN "${NUM_THREADS}" \ --genomeDir "${GENOME_INDEX_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.05 \ --alignIntronMax 1 \ --alignMatesGapMax 1000000 -
4
Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)
featureCounts v2.0.6$ Bash example
# Install Subread package which includes featureCounts # conda install -c bioconda subread=2.0.6 # Download Gencode v19 human annotation GTF file # wget -O gencode.v19.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz # gunzip gencode.v19.annotation.gtf.gz # Run featureCounts featureCounts -a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam
-
5
Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.
$ Bash example
# Install SAILOR (Python package) # pip install sailor # Placeholder for reference genome and dbSNP v147 # Replace with actual paths to your reference genome FASTA and dbSNP VCF file. # dbSNP v147 can typically be downloaded from NCBI or UCSC. REFERENCE_FASTA="/path/to/reference_genome.fasta" DBSNP_VCF="/path/to/dbsnp_v147.vcf.gz" # Placeholder for input BAM file (e.g., alignment file from RNA-seq or eCLIP) # and output VCF file for called edits. INPUT_BAM="input_aligned_reads.bam" OUTPUT_VCF="output_rna_edits_filtered_dbsnp.vcf" # Call RNA edits using SAILOR with default parameters # and filter out known SNPs using dbSNP v147. # The --dbsnp option integrates the SNP removal during the calling process. # Other parameters are left at their default values as per the description. sailor call \ -b "${INPUT_BAM}" \ -r "${REFERENCE_FASTA}" \ -o "${OUTPUT_VCF}" \ --dbsnp "${DBSNP_VCF}" -
6
bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw
$ Bash example
# Install BedTools (v2.27.1) # conda install -c bioconda bedtools=2.27.1 # Install UCSC tools (for bedGraphToBigWig) # conda install -c bioconda ucsc-bedgraphtobigwig # Reference genome sizes file (hg19) # Download from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes # Or use a local path if already available HG19_CHROM_SIZES="hg19.chrom.sizes" # Placeholder, replace with actual path # Input BAM files (from SAILOR intermediates) FWD_BAM="fwd.bam" # Replace with actual path to forward strand BAM REV_BAM="rev.bam" # Replace with actual path to reverse strand BAM # Generate bedGraph for forward strand (negative strand coverage) bedtools genomecov -split -strand - -g "${HG19_CHROM_SIZES}" -bg -ibam "${FWD_BAM}" > fwd.bg # Sort the forward strand bedGraph bedtools sort -i fwd.bg > fwd.sorted.bg # Convert sorted forward strand bedGraph to BigWig bedGraphToBigWig fwd.sorted.bg "${HG19_CHROM_SIZES}" fwd.sorted.bw # Generate bedGraph for reverse strand (positive strand coverage) bedtools genomecov -split -strand + -g "${HG19_CHROM_SIZES}" -bg -ibam "${REV_BAM}" > rev.bg # Sort the reverse strand bedGraph bedtools sort -i rev.bg > rev.sorted.bg # Convert sorted reverse strand bedGraph to BigWig bedGraphToBigWig rev.sorted.bg "${HG19_CHROM_SIZES}" rev.sorted.bw
Raw Source Text
Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8 Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19) Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs. bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw Genome_build: hg19