GSE86041 Processing Pipeline

RIP-Seq code_examples 7 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

GSE86041

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [iCLIP-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

    cutadapt v2.10 GitHub
    $ Bash example
    # Install cutadapt (e.g., using conda)
    # conda install -c bioconda cutadapt=2.10
    
    cutadapt \
      --match-read-wildcards \
      --times 2 \
      -e 0 \
      -O 5 \
      --quality-cutoff 6 \
      -m 18 \
      -b TCGTATGCCGTCTTCTGCTTG \
      -b ATCTCGTATGCCGTCTTCTGCTTG \
      -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
      -b TGGAATTCTCGGGTGCCAAGG \
      -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
      -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
      -o trimmed_reads.fastq.gz \
      raw_reads.fastq.gz
  2. 2

    Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al.

    Bowtie v1.0.0 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie
    
    # Define input and output files
    # Replace 'trimmed_reads.fastq' with your actual trimmed reads file
    TRIMMED_READS="trimmed_reads.fastq"
    # Replace 'repbase_18.05' with the path to your Bowtie index for repetitive elements
    # This index should be built from the RepBase (version 18.05) repetitive elements database
    BOWTIE_INDEX="repbase_18.05"
    OUTPUT_SAM="mapped_to_repbase.sam"
    
    # Run Bowtie mapping
    # Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05)
    # using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20
    bowtie -S -q -p 16 -e 100 -l 20 "${BOWTIE_INDEX}" "${TRIMMED_READS}" > "${OUTPUT_SAM}"
  3. 3

    2009).

    N/A (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash)
  4. 4

    Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1 (Dobin et al.

    $ Bash example
    # Install STAR if not already installed
    # conda install -c bioconda star=2.3.03
    
    # Placeholder for STAR genome index directory for mm9 (UCSC assembly)
    # You would need to download or build the mm9 STAR index first.
    # Example command to build index (replace paths and threads):
    # STAR --runMode genomeGenerate --genomeDir /path/to/mm9_star_index --genomeFastaFiles /path/to/mm9.fa --sjdbGTFfile /path/to/mm9.gtf --runThreadN <num_threads>
    GENOME_DIR="/path/to/mm9_star_index" # Replace with actual path to mm9 STAR index
    
    # Placeholder for input reads file (FASTQ format, pre-filtered for repetitive elements)
    INPUT_READS="input_reads.fastq" # Replace with your actual input FASTQ file
    
    # Placeholder for output prefix
    OUTPUT_PREFIX="mapped_reads"
    
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${INPUT_READS}" \
         --outSAMunmapped Within \
         --outFilterMultimapNmax 1 \
         --outFilterMultimapScoreRange 1 \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --runThreadN 8 # Example: Adjust number of threads as needed
    
  5. 5

    2013).

    (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # No specific tool or command could be inferred from the description '2013)'.
    # Please provide more context to generate a relevant bash command.
    
    # Placeholder for a generic reference genome (e.g., human hg38)
    # This would typically be used by an alignment or peak calling tool.
    # GENOME_FASTA="/path/to/your/genome/hg38.fa"
    # GENOME_INDEX_PREFIX="/path/to/your/genome/index/hg38" # For aligners like STAR, HISAT2
    
    # Placeholder for input and output files
    # INPUT_FASTQ="sample.fastq.gz"
    # OUTPUT_BAM="aligned.bam"
    # OUTPUT_PEAKS="peaks.bed"
    
    # Example of a generic command structure (replace with actual tool and parameters)
    # For alignment:
    # STAR --runThreadN 8 --genomeDir "${GENOME_INDEX_PREFIX}" --readFilesIn "${INPUT_FASTQ}" --outFileNamePrefix "${OUTPUT_BAM%.bam}"
    # For peak calling (e.g., CLIPper, MACS2):
    # clipper -i "${OUTPUT_BAM}" -o "${OUTPUT_PEAKS}" -s hg38
    
    echo "Placeholder: No specific bioinformatics command inferred due to insufficient description."
  6. 6

    Reads having the same 5’ mapping position were collapsed to a single read to eliminate PCR duplication.

    samtools markdup (Inferred with models/gemini-2.5-flash) v1.19 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Input BAM file (assumed to be sorted by coordinate)
    INPUT_BAM="aligned_reads.bam"
    OUTPUT_BAM="deduplicated_reads.bam"
    
    # Collapse reads having the same 5' mapping position to a single read to eliminate PCR duplication.
    # The -r option removes duplicate reads instead of just marking them.
    samtools markdup -r "${INPUT_BAM}" "${OUTPUT_BAM}"
  7. 7

    CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010).

    $ Bash example
    # Install clipper (if not already installed)
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # python setup.py install # Or just use the script directly
    
    # Placeholder variables - User should replace these with actual file paths
    # For human (hg38) genome, you can download .fa and .gtf from UCSC or Ensembl.
    IP_BAM="path/to/your/ip.bam"
    CONTROL_BAM="path/to/your/control.bam" # Optional, but highly recommended for CLIP-seq
    GENOME_FASTA="path/to/your/hg38.fa" 
    GENOME_ANNOTATION="path/to/your/hg38.gtf" 
    OUTPUT_DIR="clipper_peaks"
    P_VALUE=0.01
    FOLD_ENRICHMENT=2
    STRAND="." # Use '.' for unstranded, '+' for forward, '-' for reverse
    THREADS=8 # Number of CPU threads to use
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute clipper
    python /path/to/clipper/clipper.py \
        -o "${OUTPUT_DIR}" \
        -p "${P_VALUE}" \
        -f "${FOLD_ENRICHMENT}" \
        -s "${STRAND}" \
        -g "${GENOME_FASTA}" \
        -a "${GENOME_ANNOTATION}" \
        -c "${CONTROL_BAM}" \
        -t "${THREADS}" \
        "${IP_BAM}"

Tools Used

Raw Source Text
Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al. 2009). Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within –outFilterMultimapNmax 1 –outFilterMultimapScoreRange 1 (Dobin et al. 2013). Reads having the same 5’ mapping position were collapsed to a single read to eliminate PCR duplication. CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010).
Genome_build: mm9
Supplementary_files_format_and_content: peaks.bed and bigwig
← Back to Analysis