GSE69585 Processing Pipeline
Publication
Target Discrimination in Nonsense-Mediated mRNA Decay Requires Upf1 ATPase Activity.Molecular cell (2015) — PMID 26253027
Dataset
GSE69585Target discrimination in nonsense-mediated mRNA decay requires Upf1 ATPase activity (RIP-Seq)
Processing Steps
Generate Jupyter Notebook-
1
Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
$ Bash example
# Install cutadapt # conda install -c bioconda cutadapt=1.16 cutadapt \ --match-read-wildcards \ --times 2 \ -e 0 \ -O 5 \ --quality-cutoff 6 \ -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b TGGAATTCTCGGGTGCCAAGG \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o trimmed_reads.fastq.gz \ input_reads.fastq.gz
-
2
Reads were then mapped against a database of repetitive elements derived from RepBase18.05.
bowtie (Inferred with models/gemini-2.5-flash) v1.2.3 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install bowtie # conda install -c bioconda bowtie=1.2.3 # --- Reference Data Preparation (RepBase18.05) --- # RepBase access typically requires registration. The following is a placeholder for downloading and indexing. # Replace with actual download method if you have access to RepBase18.05 sequences. # Example: Download RepBase sequences (e.g., from girinst.org after registration) # wget -O RepBase18.05.fasta.gz "https://www.girinst.org/repbase/update/RepBase18.05.fasta.gz" # Placeholder URL # gunzip RepBase18.05.fasta.gz # Build bowtie index for RepBase18.05 # bowtie-build RepBase18.05.fasta RepBase18.05_index # --- Mapping Reads to Repetitive Elements --- # Input: reads.fastq.gz (replace with your actual input reads file) # Output: reads_rep_mapped.sam (SAM file containing reads mapped to repetitive elements) # Parameters: # -v 2: Allow up to 2 mismatches in the alignment. # -m 1: Suppress alignments that are not unique (i.e., if a read maps to more than 1 location, it's suppressed). # This is a common parameter for filtering out highly repetitive reads that map to many places. # --best --strata: Report alignments in "best" strata first, then "next best", etc. # -S: Output in SAM format. # RepBase18.05_index: The prefix for the RepBase bowtie index. # reads.fastq.gz: The input FASTQ file. bowtie -v 2 -m 1 --best --strata -S reads_rep_mapped.sam RepBase18.05_index reads.fastq.gz
-
3
Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie=1.0.0 # Assuming 'repbase_index' is the prefix for the Bowtie index files generated from Repbase sequences # Assuming 'reads.fastq' is the input reads file # Assuming 'aligned.sam' is the desired output SAM file bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > aligned.sam
-
4
Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.3.0e # Placeholder for STAR genome index for hg19 # This index needs to be generated once using STAR --runMode genomeGenerate # Example command to generate index (replace paths and thread count as needed): # STAR --runMode genomeGenerate \ # --genomeDir /path/to/hg19_star_index \ # --genomeFastaFiles /path/to/hg19.fa \ # --sjdbGTFfile /path/to/hg19_genes.gtf \ # --runThreadN 8 # Align reads to the hg19 human genome STAR \ --genomeDir /path/to/hg19_star_index \ --readFilesIn input_reads_not_mapped_to_repbase.fastq \ --outFileNamePrefix aligned_to_hg19_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --runThreadN 8 # Example: use 8 threads for alignment
-
5
RPKMs for each gene annotated in gencode v17 were calculated from RIP-seq data using custom scripts
$ Bash example
# Download GENCODE v17 annotation # mkdir -p references # wget -P references ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_17/gencode.v17.annotation.gtf.gz # gunzip references/gencode.v17.annotation.gtf.gz # Assume aligned RIP-seq BAM file is available: rip_seq_aligned.bam # And the GTF file is: references/gencode.v17.annotation.gtf # 1. Count reads per gene using featureCounts from the Subread package # conda install -c bioconda subread featureCounts -a references/gencode.v17.annotation.gtf -o gene_counts.txt -F GTF -t exon -g gene_id rip_seq_aligned.bam # 2. Get total mapped reads from the BAM file (excluding unmapped reads) # conda install -c bioconda samtools total_mapped_reads=$(samtools view -c -F 4 rip_seq_aligned.bam) # 3. Calculate RPKM for each gene using a custom awk script # This script parses the featureCounts output and applies the RPKM formula. # It assumes the featureCounts output file 'gene_counts.txt' has the following columns: # Geneid, Chr, Start, End, Strand, Length, /path/to/rip_seq_aligned.bam (read counts) # It skips the initial header lines and the first data header line. awk -v total_reads="$total_mapped_reads" ' BEGIN { OFS="\t"; print "Geneid", "RPKM" } /^#/ { next } # Skip lines starting with # (comments from featureCounts) NR > 1 { # Skip the first header line of the data table gene_id = $1; gene_length = $6; read_count = $7; # Assuming the BAM file is the 7th column in featureCounts output if (gene_length > 0 && total_reads > 0) { rpkm = (read_count * 10^9) / (gene_length * total_reads); print gene_id, rpkm; } else { print gene_id, 0; # Handle cases with zero length or zero total reads } }' gene_counts.txt > gene_rpkm.txt
Tools Used
Raw Source Text
Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009). Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1. RPKMs for each gene annotated in gencode v17 were calculated from RIP-seq data using custom scripts Genome_build: hg19 Supplementary_files_format_and_content: rpkm files, contains RPKMs for each sample