GSE155649 Processing Pipeline

OTHER code_examples 6 steps

Publication

Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.

Nature methods (2021) — PMID 33963355

Dataset

Robust single-cell discovery of RNA targets of RNA binding proteins and ribosomes [RNA-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1
Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz

cutadapt v1.14 GitHub
$ Bash example
```
cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
```
View on GitHub
2
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8

STAR v2.4.0i GitHub
$ Bash example
```
STAR --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
```
View on GitHub

Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index

STAR v2.7.0f GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Placeholder for input unmapped reads
# Replace 'unmapped_reads.fastq.gz' with your actual input file
INPUT_FASTQ="unmapped_reads.fastq.gz"

# Placeholder for STAR hg19 genome index directory
# Replace '/path/to/hg19_star_index' with the actual path to your hg19 STAR index
# If the index does not exist, you would need to generate it first:
# STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/hg19_star_index --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf
GENOME_INDEX_DIR="/path/to/hg19_star_index"

# Output prefix for mapped files
OUTPUT_PREFIX="genome_mapped_"

# Number of threads to use
NUM_THREADS=8

STAR --runThreadN "${NUM_THREADS}" \
     --genomeDir "${GENOME_INDEX_DIR}" \
     --readFilesIn "${INPUT_FASTQ}" \
     --readFilesCommand zcat \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 999 \
     --outFilterMismatchNoverLmax 0.05 \
     --alignIntronMax 1 \
     --alignMatesGapMax 1000000

View on GitHub

Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)

featureCounts v2.0.6

$ Bash example

# Install Subread package which includes featureCounts
# conda install -c bioconda subread=2.0.6

# Download Gencode v19 human annotation GTF file
# wget -O gencode.v19.annotation.gtf.gz ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
# gunzip gencode.v19.annotation.gtf.gz

# Run featureCounts
featureCounts -a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam

Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.

SAILOR vunspecified GitHub

$ Bash example

# Install SAILOR (Python package)
# pip install sailor

# Placeholder for reference genome and dbSNP v147
# Replace with actual paths to your reference genome FASTA and dbSNP VCF file.
# dbSNP v147 can typically be downloaded from NCBI or UCSC.
REFERENCE_FASTA="/path/to/reference_genome.fasta"
DBSNP_VCF="/path/to/dbsnp_v147.vcf.gz"

# Placeholder for input BAM file (e.g., alignment file from RNA-seq or eCLIP)
# and output VCF file for called edits.
INPUT_BAM="input_aligned_reads.bam"
OUTPUT_VCF="output_rna_edits_filtered_dbsnp.vcf"

# Call RNA edits using SAILOR with default parameters
# and filter out known SNPs using dbSNP v147.
# The --dbsnp option integrates the SNP removal during the calling process.
# Other parameters are left at their default values as per the description.
sailor call \
    -b "${INPUT_BAM}" \
    -r "${REFERENCE_FASTA}" \
    -o "${OUTPUT_VCF}" \
    --dbsnp "${DBSNP_VCF}"

View on GitHub

bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw

SAILOR v2.27.1 GitHub

$ Bash example

# Install BedTools (v2.27.1)
# conda install -c bioconda bedtools=2.27.1

# Install UCSC tools (for bedGraphToBigWig)
# conda install -c bioconda ucsc-bedgraphtobigwig

# Reference genome sizes file (hg19)
# Download from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
# Or use a local path if already available
HG19_CHROM_SIZES="hg19.chrom.sizes" # Placeholder, replace with actual path

# Input BAM files (from SAILOR intermediates)
FWD_BAM="fwd.bam" # Replace with actual path to forward strand BAM
REV_BAM="rev.bam" # Replace with actual path to reverse strand BAM

# Generate bedGraph for forward strand (negative strand coverage)
bedtools genomecov -split -strand - -g "${HG19_CHROM_SIZES}" -bg -ibam "${FWD_BAM}" > fwd.bg

# Sort the forward strand bedGraph
bedtools sort -i fwd.bg > fwd.sorted.bg

# Convert sorted forward strand bedGraph to BigWig
bedGraphToBigWig fwd.sorted.bg "${HG19_CHROM_SIZES}" fwd.sorted.bw

# Generate bedGraph for reverse strand (positive strand coverage)
bedtools genomecov -split -strand + -g "${HG19_CHROM_SIZES}" -bg -ibam "${REV_BAM}" > rev.bg

# Sort the reverse strand bedGraph
bedtools sort -i rev.bg > rev.sorted.bg

# Convert sorted reverse strand bedGraph to BigWig
bedGraphToBigWig rev.sorted.bg "${HG19_CHROM_SIZES}" rev.sorted.bw

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f  fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType  EndToEnd  --genomeDir  repbase  --genomeLoad  NoSharedMemory  --outBAMcompression  10  --outFileNamePrefix  data  --outFilterMultimapNmax  10  --outFilterMultimapScoreRange  1  --outFilterScoreMin  10  --outFilterType  BySJout  --outReadsUnmapped  Fastx  --outSAMattrRGline  ID:foo  --outSAMattributes  All  --outSAMmode  Full  --outSAMtype  BAM  Unsorted  --outSAMunmapped  Within  --outStd  Log  --readFilesIn data.fastqTr.fq  --runMode  alignReads  --runThreadN  8
Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index
Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)
Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.
bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). :     bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg  bedtools sort -I fwd.bg > fwd.sorted.bg   bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw    bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg  bedtools sort -I rev.bg > rev.sorted.bg   bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw
Genome_build: hg19

← Back to Analysis