GSE155729 Processing Pipeline

GSE code_examples 6 steps

Publication

Robust single-cell discovery of RNA targets of RNA-binding proteins and ribosomes.

Nature methods (2021) — PMID 33963355

Dataset

Robust single-cell discovery of RNA targets of RNA binding proteins and ribosomes

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz

cutadapt v1.14 GitHub

$ Bash example

# cutadapt is a Python package and can be installed via pip or conda.
# For example, using conda:
# conda install -c bioconda cutadapt=1.14

cutadapt -O 5 -f fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz

View on GitHub

2
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8

STAR v2.4.0i GitHub
$ Bash example
```
STAR --alignEndsType EndToEnd --genomeDir repbase --genomeLoad NoSharedMemory --outBAMcompression 10 --outFileNamePrefix data --outFilterMultimapNmax 10 --outFilterMultimapScoreRange 1 --outFilterScoreMin 10 --outFilterType BySJout --outReadsUnmapped Fastx --outSAMattrRGline ID:foo --outSAMattributes All --outSAMmode Full --outSAMtype BAM Unsorted --outSAMunmapped Within --outStd Log --readFilesIn data.fastqTr.fq --runMode alignReads --runThreadN 8
```
View on GitHub

Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index

STAR v2.7.0f GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables
INPUT_FASTQ="unmapped_reads.fastq.gz" # Placeholder for reads unmapped to repeat elements
GENOME_DIR="/path/to/STAR_hg19_index" # Placeholder for hg19 STAR index (e.g., from UCSC hg19)
OUTPUT_PREFIX="unmapped_to_genome_"
THREADS=8 # Example, adjust as needed

# Align reads to the human genome (hg19) with stringent parameters
# Parameters are inferred from the yeolab/eclip workflow's star_align_unmapped.cwl
STAR --runThreadN ${THREADS} \
     --genomeDir ${GENOME_DIR} \
     --readFilesIn ${INPUT_FASTQ} \
     --outFileNamePrefix ${OUTPUT_PREFIX} \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --alignIntronMax 1 \
     --alignMatesGapMax 1000000 \
     --alignSJDBoverhangMin 1 \
     --alignSJoverhangMin 1 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --outFilterMatchNmin 50 \
     --readFilesCommand zcat

View on GitHub

Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)

featureCounts v2.0.3 GitHub

$ Bash example

# Install Subread (which includes featureCounts)
# conda install -c bioconda subread

# Download Gencode v19 human annotation GTF (GRCh37/hg19)
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
gunzip gencode.v19.annotation.gtf.gz

# Execute featureCounts
featureCounts -a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam

View on GitHub

Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.

SAILOR v0.1.0 GitHub

$ Bash example

# Install SAILOR (Python package)
# pip install sailor

# Placeholder for input BAM file (e.g., alignment file from RNA-seq)
INPUT_BAM="input_aligned_reads.bam"
# Placeholder for output VCF file where called edits will be stored
OUTPUT_VCF="rna_edits.vcf"
# Placeholder for the reference genome FASTA file (e.g., human hg38)
REFERENCE_FASTA="/path/to/reference_genome/hg38.fasta"
# Path to the dbSNP v147 VCF file used to filter known SNPs
DBSNP_VCF="/path/to/dbsnp/dbSNP_v147.vcf"

# Call RNA edits using SAILOR with default parameters and filter against dbSNP v147
sailor call -i "${INPUT_BAM}" -o "${OUTPUT_VCF}" -r "${REFERENCE_FASTA}" -d "${DBSNP_VCF}"

View on GitHub

bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). : bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg bedtools sort -I fwd.bg > fwd.sorted.bg bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg bedtools sort -I rev.bg > rev.sorted.bg bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw

SAILOR vBedTools v2.27.1, UCSC tools (version not specified) GitHub

$ Bash example

# Install BedTools (v2.27.1) if not already installed
# conda install -c bioconda bedtools=2.27.1

# Install UCSC tools (bedGraphToBigWig) if not already installed
# conda install -c bioconda ucsc-bedgraphtobigwig

# Reference genome sizes file (hg19.chrom.sizes)
# This file can typically be found on UCSC Genome Browser downloads or similar resources.
# Example: wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes

# Generate bigwig for forward strand
bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg
bedtools sort -i fwd.bg > fwd.sorted.bg
bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw

# Generate bigwig for reverse strand
bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg
bedtools sort -i rev.bg > rev.sorted.bg
bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw

View on GitHub

Tools Used

STAR SAILOR

Raw Source Text

Raw reads were trimmed using cutadapt (v1.14) using the following parameters -O 5 -f  fastq --match-read-wildcards --times 2 -e 0.0 --quality-cutoff 6 -m 18 -o data.fastqTr.fq -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT data.fastq.gz
Trimmed reads were mapped to and filtered of repeat elements (RepBase 18.05) with STAR (2.4.0i) using the following parameters: --alignEndsType  EndToEnd  --genomeDir  repbase  --genomeLoad  NoSharedMemory  --outBAMcompression  10  --outFileNamePrefix  data  --outFilterMultimapNmax  10  --outFilterMultimapScoreRange  1  --outFilterScoreMin  10  --outFilterType  BySJout  --outReadsUnmapped  Fastx  --outSAMattrRGline  ID:foo  --outSAMattributes  All  --outSAMmode  Full  --outSAMtype  BAM  Unsorted  --outSAMunmapped  Within  --outStd  Log  --readFilesIn data.fastqTr.fq  --runMode  alignReads  --runThreadN  8
Reads unmapped to repeat elements were mapped to the human genome with STAR using the same parameters as the previous step, using an hg19 index in place of the repeat element index
Subread featureCounts (-a gencode.v19.annotation.gtf -s 2 -p -o counts.txt data.bam) was used to count features using human annotations (Gencode v19)
Edits were called using SAILOR (http://github.com/yeolab/sailor) using default parameters and dbSNP (v147) to remove known SNPs.
bigwig files were generated from filtered BAM files (intermediates from SAILOR), using the following commands from BedTools (v2.27.1). :     bedtools genomecov -split -strand - -g hg19.chrom.sizes -bg -ibam fwd.bam > fwd.bg  bedtools sort -I fwd.bg > fwd.sorted.bg   bedGraphToBigWig fwd.sorted.bg hg19.chrom.sizes fwd.sorted.bw    bedtools genomecov -split -strand + -g hg19.chrom.sizes -bg -ibam rev.bam > rev.bg  bedtools sort -I rev.bg > rev.sorted.bg   bedGraphToBigWig rev.sorted.bg hg19.chrom.sizes rev.sorted.bw
Genome_build: hg19

← Back to Analysis