GSE155729 Processing Pipeline
Publication
Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.Nature methods (2021) — PMID 33963355
Processing Steps
Generate Jupyter Notebook-
1
Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
$ Bash example
# cutadapt is a Python package and can be installed via pip or conda. # For example, using conda: # conda install -c bioconda cutadapt=1.14 cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
-
2
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
$ Bash example
STAR --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
-
3
Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables INPUT_FASTQ="unmapped_reads.fastq.gz" # Placeholder for reads unmapped to repeat elements GENOME_DIR="/path/to/STAR_hg19_index" # Placeholder for hg19 STAR index (e.g., from UCSC hg19) OUTPUT_PREFIX="unmapped_to_genome_" THREADS=8 # Example, adjust as needed # Align reads to the human genome (hg19) with stringent parameters # Parameters are inferred from the yeolab/eclip workflow's star_align_unmapped.cwl STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${INPUT_FASTQ} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --alignIntronMax 1 \ --alignMatesGapMax 1000000 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 1 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --outFilterMatchNmin 50 \ --readFilesCommand zcat -
4
Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)
$ Bash example
# Install Subread (which includes featureCounts) # conda install -c bioconda subread # Download Gencode v19 human annotation GTF (GRCh37/hg19) wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz gunzip gencode.v19.annotation.gtf.gz # Execute featureCounts featureCounts -a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam
-
5
Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.
$ Bash example
# Install SAILOR (Python package) # pip install sailor # Placeholder for input BAM file (e.g., alignment file from RNA-seq) INPUT_BAM="input_aligned_reads.bam" # Placeholder for output VCF file where called edits will be stored OUTPUT_VCF="rna_edits.vcf" # Placeholder for the reference genome FASTA file (e.g., human hg38) REFERENCE_FASTA="/path/to/reference_genome/hg38.fasta" # Path to the dbSNP v147 VCF file used to filter known SNPs DBSNP_VCF="/path/to/dbsnp/dbSNP_v147.vcf" # Call RNA edits using SAILOR with default parameters and filter against dbSNP v147 sailor call -i "${INPUT_BAM}" -o "${OUTPUT_VCF}" -r "${REFERENCE_FASTA}" -d "${DBSNP_VCF}" -
6
bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw
$ Bash example
# Install BedTools (v2.27.1) if not already installed # conda install -c bioconda bedtools=2.27.1 # Install UCSC tools (bedGraphToBigWig) if not already installed # conda install -c bioconda ucsc-bedgraphtobigwig # Reference genome sizes file (hg19.chrom.sizes) # This file can typically be found on UCSC Genome Browser downloads or similar resources. # Example: wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes # Generate bigwig for forward strand bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -i fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw # Generate bigwig for reverse strand bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -i rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw
Raw Source Text
Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8 Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19) Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs. bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw Genome_build: hg19