GSE69585 Processing Pipeline

RIP-Seq code_examples 5 steps

Publication

Target Discrimination in Nonsense-Mediated mRNA Decay Requires Upf1 ATPase Activity.

Molecular cell (2015) — PMID 26253027

Dataset

GSE69585

Target discrimination in nonsense-mediated mRNA decay requires Upf1 ATPase activity (RIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

    cutadapt v1.16 GitHub
    $ Bash example
    # Install cutadapt
    # conda install -c bioconda cutadapt=1.16
    
    cutadapt \
      --match-read-wildcards \
      --times 2 \
      -e 0 \
      -O 5 \
      --quality-cutoff 6 \
      -m 18 \
      -b TCGTATGCCGTCTTCTGCTTG \
      -b ATCTCGTATGCCGTCTTCTGCTTG \
      -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
      -b TGGAATTCTCGGGTGCCAAGG \
      -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
      -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
      -o trimmed_reads.fastq.gz \
      input_reads.fastq.gz
  2. 2

    Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

    bowtie (Inferred with models/gemini-2.5-flash) v1.2.3 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install bowtie
    # conda install -c bioconda bowtie=1.2.3
    
    # --- Reference Data Preparation (RepBase18.05) ---
    # RepBase access typically requires registration. The following is a placeholder for downloading and indexing.
    # Replace with actual download method if you have access to RepBase18.05 sequences.
    # Example: Download RepBase sequences (e.g., from girinst.org after registration)
    # wget -O RepBase18.05.fasta.gz "https://www.girinst.org/repbase/update/RepBase18.05.fasta.gz" # Placeholder URL
    # gunzip RepBase18.05.fasta.gz
    
    # Build bowtie index for RepBase18.05
    # bowtie-build RepBase18.05.fasta RepBase18.05_index
    
    # --- Mapping Reads to Repetitive Elements ---
    # Input: reads.fastq.gz (replace with your actual input reads file)
    # Output: reads_rep_mapped.sam (SAM file containing reads mapped to repetitive elements)
    # Parameters:
    #   -v 2: Allow up to 2 mismatches in the alignment.
    #   -m 1: Suppress alignments that are not unique (i.e., if a read maps to more than 1 location, it's suppressed).
    #         This is a common parameter for filtering out highly repetitive reads that map to many places.
    #   --best --strata: Report alignments in "best" strata first, then "next best", etc.
    #   -S: Output in SAM format.
    #   RepBase18.05_index: The prefix for the RepBase bowtie index.
    #   reads.fastq.gz: The input FASTQ file.
    
    bowtie -v 2 -m 1 --best --strata -S reads_rep_mapped.sam RepBase18.05_index reads.fastq.gz
  3. 3

    Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

    Bowtie v1.0.0 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie=1.0.0
    
    # Assuming 'repbase_index' is the prefix for the Bowtie index files generated from Repbase sequences
    # Assuming 'reads.fastq' is the input reads file
    # Assuming 'aligned.sam' is the desired output SAM file
    
    bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > aligned.sam
  4. 4

    Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star=2.3.0e
    
    # Placeholder for STAR genome index for hg19
    # This index needs to be generated once using STAR --runMode genomeGenerate
    # Example command to generate index (replace paths and thread count as needed):
    # STAR --runMode genomeGenerate \
    #   --genomeDir /path/to/hg19_star_index \
    #   --genomeFastaFiles /path/to/hg19.fa \
    #   --sjdbGTFfile /path/to/hg19_genes.gtf \
    #   --runThreadN 8
    
    # Align reads to the hg19 human genome
    STAR \
      --genomeDir /path/to/hg19_star_index \
      --readFilesIn input_reads_not_mapped_to_repbase.fastq \
      --outFileNamePrefix aligned_to_hg19_ \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMunmapped Within \
      --outFilterMultimapNmax 1 \
      --outFilterMultimapScoreRange 1 \
      --runThreadN 8 # Example: use 8 threads for alignment
  5. 5

    RPKMs for each gene annotated in gencode v17 were calculated from RIP-seq data using custom scripts

    GENCODE vN/A GitHub
    $ Bash example
    # Download GENCODE v17 annotation
    # mkdir -p references
    # wget -P references ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_17/gencode.v17.annotation.gtf.gz
    # gunzip references/gencode.v17.annotation.gtf.gz
    
    # Assume aligned RIP-seq BAM file is available: rip_seq_aligned.bam
    # And the GTF file is: references/gencode.v17.annotation.gtf
    
    # 1. Count reads per gene using featureCounts from the Subread package
    # conda install -c bioconda subread
    featureCounts -a references/gencode.v17.annotation.gtf -o gene_counts.txt -F GTF -t exon -g gene_id rip_seq_aligned.bam
    
    # 2. Get total mapped reads from the BAM file (excluding unmapped reads)
    # conda install -c bioconda samtools
    total_mapped_reads=$(samtools view -c -F 4 rip_seq_aligned.bam)
    
    # 3. Calculate RPKM for each gene using a custom awk script
    # This script parses the featureCounts output and applies the RPKM formula.
    # It assumes the featureCounts output file 'gene_counts.txt' has the following columns:
    # Geneid, Chr, Start, End, Strand, Length, /path/to/rip_seq_aligned.bam (read counts)
    # It skips the initial header lines and the first data header line.
    awk -v total_reads="$total_mapped_reads" '
    BEGIN { OFS="\t"; print "Geneid", "RPKM" }
    /^#/ { next } # Skip lines starting with # (comments from featureCounts)
    NR > 1 { # Skip the first header line of the data table
        gene_id = $1;
        gene_length = $6;
        read_count = $7; # Assuming the BAM file is the 7th column in featureCounts output
        if (gene_length > 0 && total_reads > 0) {
            rpkm = (read_count * 10^9) / (gene_length * total_reads);
            print gene_id, rpkm;
        } else {
            print gene_id, 0; # Handle cases with zero length or zero total reads
        }
    }' gene_counts.txt > gene_rpkm.txt

Tools Used

Raw Source Text
Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.
RPKMs for each gene annotated in gencode v17 were calculated from RIP-seq data using custom scripts
Genome_build: hg19
Supplementary_files_format_and_content: rpkm files, contains RPKMs for each sample
← Back to Analysis