GSE86041 Processing Pipeline
Publication
Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.Neuron (2016) — PMID 27773581
Dataset
GSE86041HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [iCLIP-seq]
Processing Steps
Generate Jupyter Notebook-
1
Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.
$ Bash example
# Install cutadapt (e.g., using conda) # conda install -c bioconda cutadapt=2.10 cutadapt \ --match-read-wildcards \ --times 2 \ -e 0 \ -O 5 \ --quality-cutoff 6 \ -m 18 \ -b TCGTATGCCGTCTTCTGCTTG \ -b ATCTCGTATGCCGTCTTCTGCTTG \ -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \ -b TGGAATTCTCGGGTGCCAAGG \ -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \ -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \ -o trimmed_reads.fastq.gz \ raw_reads.fastq.gz
-
2
Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al.
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie # Define input and output files # Replace 'trimmed_reads.fastq' with your actual trimmed reads file TRIMMED_READS="trimmed_reads.fastq" # Replace 'repbase_18.05' with the path to your Bowtie index for repetitive elements # This index should be built from the RepBase (version 18.05) repetitive elements database BOWTIE_INDEX="repbase_18.05" OUTPUT_SAM="mapped_to_repbase.sam" # Run Bowtie mapping # Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) # using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 bowtie -S -q -p 16 -e 100 -l 20 "${BOWTIE_INDEX}" "${TRIMMED_READS}" > "${OUTPUT_SAM}" -
3
2009).
N/A (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash) -
4
Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1 (Dobin et al.
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star=2.3.03 # Placeholder for STAR genome index directory for mm9 (UCSC assembly) # You would need to download or build the mm9 STAR index first. # Example command to build index (replace paths and threads): # STAR --runMode genomeGenerate --genomeDir /path/to/mm9_star_index --genomeFastaFiles /path/to/mm9.fa --sjdbGTFfile /path/to/mm9.gtf --runThreadN <num_threads> GENOME_DIR="/path/to/mm9_star_index" # Replace with actual path to mm9 STAR index # Placeholder for input reads file (FASTQ format, pre-filtered for repetitive elements) INPUT_READS="input_reads.fastq" # Replace with your actual input FASTQ file # Placeholder for output prefix OUTPUT_PREFIX="mapped_reads" STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_READS}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --runThreadN 8 # Example: Adjust number of threads as needed -
5
2013).
(Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# No specific tool or command could be inferred from the description '2013)'. # Please provide more context to generate a relevant bash command. # Placeholder for a generic reference genome (e.g., human hg38) # This would typically be used by an alignment or peak calling tool. # GENOME_FASTA="/path/to/your/genome/hg38.fa" # GENOME_INDEX_PREFIX="/path/to/your/genome/index/hg38" # For aligners like STAR, HISAT2 # Placeholder for input and output files # INPUT_FASTQ="sample.fastq.gz" # OUTPUT_BAM="aligned.bam" # OUTPUT_PEAKS="peaks.bed" # Example of a generic command structure (replace with actual tool and parameters) # For alignment: # STAR --runThreadN 8 --genomeDir "${GENOME_INDEX_PREFIX}" --readFilesIn "${INPUT_FASTQ}" --outFileNamePrefix "${OUTPUT_BAM%.bam}" # For peak calling (e.g., CLIPper, MACS2): # clipper -i "${OUTPUT_BAM}" -o "${OUTPUT_PEAKS}" -s hg38 echo "Placeholder: No specific bioinformatics command inferred due to insufficient description." -
6
Reads having the same 5â mapping position were collapsed to a single read to eliminate PCR duplication.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Input BAM file (assumed to be sorted by coordinate) INPUT_BAM="aligned_reads.bam" OUTPUT_BAM="deduplicated_reads.bam" # Collapse reads having the same 5' mapping position to a single read to eliminate PCR duplication. # The -r option removes duplicate reads instead of just marking them. samtools markdup -r "${INPUT_BAM}" "${OUTPUT_BAM}" -
7
CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010).
$ Bash example
# Install clipper (if not already installed) # git clone https://github.com/yeolab/clipper.git # cd clipper # python setup.py install # Or just use the script directly # Placeholder variables - User should replace these with actual file paths # For human (hg38) genome, you can download .fa and .gtf from UCSC or Ensembl. IP_BAM="path/to/your/ip.bam" CONTROL_BAM="path/to/your/control.bam" # Optional, but highly recommended for CLIP-seq GENOME_FASTA="path/to/your/hg38.fa" GENOME_ANNOTATION="path/to/your/hg38.gtf" OUTPUT_DIR="clipper_peaks" P_VALUE=0.01 FOLD_ENRICHMENT=2 STRAND="." # Use '.' for unstranded, '+' for forward, '-' for reverse THREADS=8 # Number of CPU threads to use # Create output directory mkdir -p "${OUTPUT_DIR}" # Execute clipper python /path/to/clipper/clipper.py \ -o "${OUTPUT_DIR}" \ -p "${P_VALUE}" \ -f "${FOLD_ENRICHMENT}" \ -s "${STRAND}" \ -g "${GENOME_FASTA}" \ -a "${GENOME_ANNOTATION}" \ -c "${CONTROL_BAM}" \ -t "${THREADS}" \ "${IP_BAM}"
Raw Source Text
Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al. 2009). Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1 (Dobin et al. 2013). Reads having the same 5â mapping position were collapsed to a single read to eliminate PCR duplication. CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010). Genome_build: mm9 Supplementary_files_format_and_content: peaks.bed and bigwig