GSE69584 Processing Pipeline

OTHER code_examples 8 steps

Publication

Target Discrimination in Nonsense-Mediated mRNA Decay Requires Upf1 ATPase Activity.

Molecular cell (2015) — PMID 26253027

Dataset

GSE69584

Target discrimination in nonsense-mediated mRNA decay requires Upf1 ATPase activity (CLIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

    cutadapt v1.18 GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt
    
    # Define input and output file names (placeholders)
    INPUT_FASTQ="input.fastq.gz"
    OUTPUT_FASTQ="output.fastq.gz"
    
    # Run cutadapt to trim adapters, polyA/T tails, and low-quality ends
    cutadapt \
      --match-read-wildcards \
      --times 2 \
      -e 0 \
      -O 5 \
      --quality-cutoff 6 \
      -m 18 \
      -b TCGTATGCCGTCTTCTGCTTG \
      -b ATCTCGTATGCCGTCTTCTGCTTG \
      -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
      -b TGGAATTCTCGGGTGCCAAGG \
      -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
      -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
      -o "${OUTPUT_FASTQ}" \
      "${INPUT_FASTQ}"
  2. 2

    Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

    bowtie (Inferred with models/gemini-2.5-flash) v1.1.2 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install bowtie and samtools if not already available
    # conda install -c bioconda bowtie samtools
    
    # Placeholder for RepBase 18.05 FASTA file. In a real pipeline, this would be downloaded or pre-existing.
    # For example, you might download from GIRI or a local mirror.
    # wget -O RepBase18.05.fasta "http://www.girinst.org/repbase/update/RepBase18.05.fasta.gz" # (Example, actual URL may vary or require license)
    # gunzip RepBase18.05.fasta.gz
    
    # Build the bowtie index for RepBase 18.05
    bowtie-build RepBase18.05.fasta RepBase18.05_index
    
    # Map reads against the RepBase 18.05 index
    # -S: output SAM format
    # -v 2: allow up to 2 mismatches
    # -m 1: suppress alignments that are not unique (map to only one location)
    # --best --strata: report best alignments
    # --chunkmbs 1024: memory chunk size for indexing
    # --threads 8: use 8 threads for mapping
    # input_reads.fastq: Placeholder for the input FASTQ file (e.g., adapter-trimmed reads)
    # repbase_mapped.bam: Output BAM file containing reads mapped to repetitive elements
    bowtie -S -v 2 -m 1 --best --strata --chunkmbs 1024 --threads 8 RepBase18.05_index input_reads.fastq | samtools view -bS - > repbase_mapped.bam
  3. 3

    Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

    Bowtie v1.0.0 GitHub
    $ Bash example
    # Install Bowtie 1.0.0 (example using conda)
    # conda install -c bioconda bowtie=1.0.0
    
    # Assuming 'repbase_index' is the path to the Bowtie index generated from Repbase sequences
    # Assuming 'reads.fastq' is the input reads file
    # Assuming 'output.sam' is the desired output SAM file
    
    bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam
  4. 4

    Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.

    $ Bash example
    # Install STAR version 2.3.0e (example using conda)
    # conda create -n star_env star=2.3.0e
    # conda activate star_env
    
    # Align reads to hg19
    # Replace /path/to/STAR_index/hg19 with the actual path to your hg19 STAR genome index.
    # Replace input_reads.fastq with the actual path to your input FASTQ file.
    # Replace output_prefix_ with your desired output file prefix.
    STAR --genomeDir /path/to/STAR_index/hg19 \
         --readFilesIn input_reads.fastq \
         --outFileNamePrefix output_prefix_ \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 1 \
         --outFilterMultimapScoreRange 1
  5. 5

    Reads that were PCR replicates were removed from each CLIP-seq library using a custom script.

    $ Bash example
    # Install samtools (if not already installed)
    # conda install -c bioconda samtools
    
    # Define input and output file names
    INPUT_BAM="aligned_sorted.bam" # Replace with your actual input aligned and sorted BAM file
    OUTPUT_BAM="deduplicated.bam"
    METRICS_FILE="deduplication_metrics.txt"
    
    # Remove PCR replicates using samtools markdup
    # -r: Remove duplicate reads (rather than just marking them)
    # -s: Output statistics to stderr (redirected to METRICS_FILE)
    samtools markdup -r -s "${INPUT_BAM}" "${OUTPUT_BAM}" 2> "${METRICS_FILE}"
  6. 6

    Briefly one read was kept at each nucleotide position when more than one read’s 5' end was mapped

    sambamba markdup (Inferred with models/gemini-2.5-flash) v0.7.1 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install sambamba (example using conda)
    # conda install -c bioconda sambamba
    
    # Deduplicate reads based on 5' end mapping position
    # -r: Remove duplicate reads
    # -t: Number of threads (example: 8)
    sambamba markdup -r -t 8 input.bam output_dedup.bam
  7. 7

    Clusters were then assigned using the CLIPper software with parameters --bonferroni --superlocal --threshold- software (Lovci et al., 2013).

    CLIPper vVersion described in Lovci et al., 2013 publication GitHub
    $ Bash example
    # Install CLIPper (example, adjust as needed)
    # CLIPper is a Python script. Ensure Python and required libraries (e.g., pysam) are installed.
    # You can clone the repository and run the script directly:
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    
    # Example execution of CLIPper for peak assignment
    # Replace 'clip_reads.bam' with your CLIP-seq alignment file (BAM format).
    # Replace 'control_reads.bam' with your control alignment file (BAM format), if applicable.
    # The '--threshold' parameter was truncated in the description; a common p-value threshold of 0.05 is used here as an inference.
    python CLIPper.py \
        --bonferroni \
        --superlocal \
        --threshold 0.05 \
        -o clipper_peaks.bed \
        clip_reads.bam \
        control_reads.bam
  8. 8

    conclusions discussed in the associated manuscript are based on the BAM files

    samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already available
    # conda install -c bioconda samtools
    
    # Generate alignment statistics from the BAM file.
    # These statistics provide a summary of the alignment quality and read counts,
    # which are often used to inform conclusions about the sequencing data.
    # Replace 'input.bam' with the actual path to your BAM file.
    # Replace 'output_dir' with your desired output directory.
    samtools flagstat input.bam > output_dir/input.bam.flagstat.txt

Tools Used

Raw Source Text
Sequencing reads from CLIP-seq and RIP-seq libraries were first trimmed of polyA tails, adapters, and low quality ends using cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the hg19 human genome (UCSC assembly) using STAR (Dobin et al., 2013) version 2.3.0e with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1.
Reads that were PCR replicates were removed from each CLIP-seq library using a custom script. Briefly one read was kept at each nucleotide position when more than one read’s 5' end was mapped
Clusters were then assigned using the CLIPper software with parameters --bonferroni --superlocal --threshold- software (Lovci et al., 2013).
Genome_build: hg19
conclusions discussed in the associated manuscript are based on the BAM files
← Back to Analysis