GSE69586 Processing Pipeline

GSE code_examples 8 steps

Publication

Target Discrimination in Nonsense-Mediated mRNA Decay Requires Upf1 ATPase Activity.

Molecular cell (2015) — PMID 26253027

Dataset

Target discrimination in nonsense-mediated mRNA decay requires Upf1 ATPase activity

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

cutadapt v2.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install cutadapt (e.g., using conda)
# conda install -c bioconda cutadapt

# Define input and output file names
INPUT_READS="reads.fastq.gz"
OUTPUT_TRIMMED_READS="trimmed_reads.fastq.gz"

# Run cutadapt to trim polyA tails, adapters, and low quality ends
cutadapt \
  --match-read-wildcards \
  --times 2 \
  -e 0 \
  -O 5 \
  --quality-cutoff 6 \
  -m 18 \
  -b TCGTATGCCGTCTTCTGCTTG \
  -b ATCTCGTATGCCGTCTTCTGCTTG \
  -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
  -b TGGAATTCTCGGGTGCCAAGG \
  -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
  -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
  -o "${OUTPUT_TRIMMED_READS}" \
  "${INPUT_READS}"

View on GitHub

Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

bowtie2 (Inferred with models/gemini-2.5-flash) v2.5.2 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bowtie2 (if not already installed)
# conda install -c bioconda bowtie2

# --- Reference Data Preparation ---
# RepBase18.05 is a proprietary database. Access typically requires a license.
# The following is a placeholder for how one would prepare the index if the FASTA file was available.

# Placeholder for RepBase18.05 FASTA file (replace with actual path to your downloaded file)
# REPBASE_FASTA="path/to/RepBase18.05.fasta"
# REPBASE_INDEX_PREFIX="reference/RepBase/RepBase18.05"

# Create directory for reference if it doesn't exist
# mkdir -p $(dirname "${REPBASE_INDEX_PREFIX}")

# Build Bowtie2 index for RepBase18.05
# bowtie2-build "${REPBASE_FASTA}" "${REPBASE_INDEX_PREFIX}"

# --- Alignment Step ---

# Define input and output files
# Assuming paired-end reads, adjust for single-end if necessary
READ1="input_read1.fastq.gz"
READ2="input_read2.fastq.gz"

# Path to the Bowtie2 index prefix for RepBase18.05
# This should be the same prefix used during the bowtie2-build step (e.g., 'reference/RepBase/RepBase18.05')
REPBASE_INDEX_PREFIX="reference/RepBase/RepBase18.05"

OUTPUT_SAM="aligned_to_repeats.sam"
UNALIGNED_PREFIX="unaligned_to_repeats"
THREADS=8 # Number of threads to use

# Align reads against the RepBase18.05 repetitive elements database
# --very-sensitive-local is often used for mapping to repetitive elements to allow for local alignments
# --un-conc-gz outputs unaligned paired reads to two gzipped files (e.g., unaligned_to_repeats.1.fastq.gz, unaligned_to_repeats.2.fastq.gz)
bowtie2 \
  --very-sensitive-local \
  -p "${THREADS}" \
  -x "${REPBASE_INDEX_PREFIX}" \
  -1 "${READ1}" \
  -2 "${READ2}" \
  --un-conc-gz "${UNALIGNED_PREFIX}" \
  -S "${OUTPUT_SAM}"

# Optional: Convert SAM to BAM and sort
# samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "${OUTPUT_SAM%.sam}.bam"
# samtools index "${OUTPUT_SAM%.sam}.bam"

View on GitHub

Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

Bowtie v1.0.0 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie=1.0.0

# Assuming 'repbase_index' is the base name of the Bowtie index files
# and 'reads.fastq' is the input reads file.
# The output will be a SAM file named 'output.sam'.
bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam

View on GitHub

Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.

STAR v2.3.0e GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star=2.3.0e

# Create STAR genome index for hg19 if not already done
# STAR --runMode genomeGenerate --genomeDir /path/to/hg19_STAR_index --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf --runThreadN <num_threads>

# Align reads to hg19
STAR --genomeDir /path/to/hg19_STAR_index \
     --readFilesIn unmapped_reads.fastq \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 1 \
     --outFilterMultimapScoreRange 1 \
     --outFileNamePrefix star_hg19_alignment_

View on GitHub

Reads that were PCR replicates were removed from each CLIP-seq library using a custom script.

CLIP-seq v1.1.2 GitHub

$ Bash example

# Install umi_tools
# conda install -c bioconda umi_tools

# Install samtools (if not already installed)
# conda install -c bioconda samtools

# Define input and output file names
INPUT_BAM="input_aligned_reads.bam"
OUTPUT_DEDUP_BAM="deduplicated_reads.bam"

# Sort the BAM file by coordinate (required for umi_tools dedup)
samtools sort -o "${INPUT_BAM%.bam}.sorted.bam" "$INPUT_BAM"

# Index the sorted BAM file
samtools index "${INPUT_BAM%.bam}.sorted.bam"

# Remove PCR replicates using umi_tools dedup.
# This command assumes Unique Molecular Identifiers (UMIs) are embedded in the read ID,
# separated by a colon, which is a common format in eCLIP workflows (e.g., from the Yeo lab).
# Adjust --extract-umi-method and --umi-separator if your UMI structure is different.
umi_tools dedup \
    --extract-umi-method=read_id \
    --umi-separator=':' \
    -I "${INPUT_BAM%.bam}.sorted.bam" \
    -S "$OUTPUT_DEDUP_BAM"

# Index the deduplicated BAM file
samtools index "$OUTPUT_DEDUP_BAM"

View on GitHub

Briefly one read was kept at each nucleotide position when more than one readâs 5' end was mapped

dedup_reads.py (Inferred with models/gemini-2.5-flash) vCustom Script (from Yeo Lab eCLIP pipeline) GitHub

$ Bash example

# Install Python (if not already available)
# conda install python=3.8

# Install pysam, a dependency for dedup_reads.py
# pip install pysam

# Clone the eclip repository to obtain the dedup_reads.py script
# git clone https://github.com/yeolab/eclip.git

# Define input and output file paths
INPUT_BAM="aligned_reads.bam" # Placeholder for the input BAM file
OUTPUT_DEDUP_BAM="deduplicated_reads.bam" # Placeholder for the deduplicated output BAM file

# Path to the dedup_reads.py script within the cloned repository
DEDUP_SCRIPT="eclip/scripts/dedup_reads.py"

# Execute the deduplication script
# This script identifies reads with identical 5' end mapping positions and keeps only one, effectively removing PCR duplicates based on 5' end.
python "${DEDUP_SCRIPT}" -i "${INPUT_BAM}" -o "${OUTPUT_DEDUP_BAM}"

View on GitHub

Clusters were then assigned using the CLIPper software with parameters --bonferroni --superlocal --threshold- software (Lovci et al., 2013).

CLIPper vInferred from Lovci et al., 2013 publication GitHub

$ Bash example

# Installation (example, adjust based on environment):
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# # Ensure Python dependencies are installed (e.g., pip install numpy scipy pysam)

# Example usage of CLIPper based on description
# Note: Input BAM file, genome size, and output file names are placeholders.
# The '--threshold-' parameter in the description is incomplete; a placeholder [VALUE] is used.
# Replace 'input_aligned_reads.bam', 'genome_size.txt', and '[VALUE]' with actual values relevant to your data.
python clipper.py \
  -b input_aligned_reads.bam \
  -s genome_size.txt \
  -o clipper_peaks.bed \
  --bonferroni \
  --superlocal \
  --threshold [VALUE]

View on GitHub

conclusions discussed in the associated manuscript are based on the BAM files

clipper (Inferred with models/gemini-2.5-flash) vunspecified GitHub

$ Bash example

# Clone the clipper repository
# git clone https://github.com/yeolab/clipper.git
# cd clipper

# Example: Create a dummy genome size file for hg38
# This file should contain chromosome names and their lengths, tab-separated.
# For a real analysis, obtain this from a genome assembly provider (e.g., UCSC, Ensembl).
# Example for hg38 (partial):
# echo -e "chr1\t248956422" > hg38.chrom.sizes
# echo -e "chr2\t242193529" >> hg38.chrom.sizes
# echo -e "chr3\t198295559" >> hg38.chrom.sizes
# ... add all chromosomes ...

# Run clipper for peak calling
# Replace 'input.bam' with your actual BAM file
# Replace 'hg38.chrom.sizes' with your genome size file (e.g., for Homo sapiens, build hg38)
# Replace 'output_peaks' with your desired output prefix
python clipper.py -b input.bam -s hg38.chrom.sizes -o output_peaks

View on GitHub

Tools Used

STAR CLIP-seq

Raw Source Text

Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1.
Reads that were PCR replicates were removed from each CLIP-seq library using a custom script. Briefly one read was kept at each nucleotide position when more than one readâs 5' end was mapped
Clusters were then assigned using the CLIPper software with parameters --bonferroni --superlocal --threshold- software (Lovci et al., 2013).
Genome_build: hg19
conclusions discussed in the associated manuscript are based on the BAM files

← Back to Analysis