GSE86227 Processing Pipeline

ncRNA-Seq code_examples 5 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_small_r…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequencing reads from small RNA-seq libraries were first trimmed adapters using cutadapt

cutadapt vNot specified, assuming a recent stable version (e.g., 4.x) GitHub

$ Bash example

# Install cutadapt (choose one method)
# conda install -c bioconda cutadapt
# pip install cutadapt

# Define input and output files
INPUT_FASTQ="small_rna_seq_reads.fastq.gz" # Placeholder for your input small RNA-seq FASTQ file
OUTPUT_FASTQ="small_rna_seq_reads_trimmed.fastq.gz" # Placeholder for your output trimmed FASTQ file

# Define the 3' adapter sequence for small RNA-seq.
# This is a common Illumina TruSeq Small RNA 3' adapter. 
# Verify the exact adapter sequence used in your library preparation kit.
ADAPTER_SEQUENCE="TGGAATTCTCGGGTGCCAAGGAACTCCAG"

# Define minimum length for trimmed reads (common for small RNA-seq, e.g., 18-20 bp)
MIN_LENGTH=18

# Number of CPU threads to use for parallel processing
NUM_THREADS=4

# Execute cutadapt to trim adapters
# -a: Specifies the 3' adapter sequence to be removed
# --minimum-length: Discards reads shorter than this length after trimming
# --discard-untrimmed: Discards reads that do not contain the adapter sequence
# -o: Specifies the output file for trimmed reads
# -j: Specifies the number of CPU threads to use
cutadapt \
  -a "${ADAPTER_SEQUENCE}" \
  --minimum-length "${MIN_LENGTH}" \
  --discard-untrimmed \
  -o "${OUTPUT_FASTQ}" \
  -j "${NUM_THREADS}" \
  "${INPUT_FASTQ}"

View on GitHub

Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

bowtie2 (Inferred with models/gemini-2.5-flash) v2.4.5 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bowtie2 (example using conda)
# conda install -c bioconda bowtie2

# Install samtools (example using conda)
# conda install -c bioconda samtools

# --- Reference Data Preparation ---
# Download RepBase18.05 (Note: RepBase is a commercial database. 
# For this example, we assume a FASTA file 'RepBase18.05.fasta' is available.
# You would typically obtain this from a licensed source or a derived public subset.)
# Example: wget -O RepBase18.05.fasta "URL_TO_REPBASE_FASTA"

# Build Bowtie2 index for RepBase18.05
bowtie2-build RepBase18.05.fasta RepBase18.05_index

# --- Mapping Reads ---
# Define input read files (replace with actual paths to your FASTQ files)
READS_R1="your_reads_R1.fastq.gz"
READS_R2="your_reads_R2.fastq.gz"
OUTPUT_BAM="mapped_to_repeats.bam"
THREADS=8 # Number of threads to use for bowtie2

# Map reads against the RepBase index
bowtie2 --very-sensitive-local -p ${THREADS} \
        -x RepBase18.05_index \
        -1 ${READS_R1} \
        -2 ${READS_R2} \
        | samtools view -bS - \
        | samtools sort -o ${OUTPUT_BAM} -

View on GitHub

Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

Bowtie v1.0.0 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie=1.0.0

# Align reads using Bowtie
# Assuming 'repbase_index' is the prefix for the Repbase index files
# Assuming 'reads.fastq' is the input FASTQ file
# Assuming 'output.sam' is the desired output SAM file
bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > output.sam

View on GitHub

Reads not mapped to Repbase sequences were aligned to the mm9 or hg19 genome (UCSC assembly) using bowtie version 1.1.1 with parameters -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128

Bowtie v1.1.1 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie=1.1.1

# Placeholder for Bowtie index (replace with actual path to mm9 or hg19 index)
# The index would typically be built once using 'bowtie-build'
# For example: bowtie-build <path_to_mm9_fasta> mm9_index
GENOME_INDEX="mm9_index" # Or "hg19_index" depending on the specific genome used

# Placeholder for input reads file (reads not mapped to Repbase sequences)
INPUT_READS="unmapped_to_repbase_reads.fastq"

# Placeholder for output SAM file
OUTPUT_SAM="aligned_to_genome.sam"

# Align reads to the mm9 or hg19 genome
bowtie -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128 "${GENOME_INDEX}" "${INPUT_READS}" > "${OUTPUT_SAM}"

View on GitHub

counts of reads for each gene annotated in gencode vM3 were calculated from featureCounts for mouse and v19 for human

featureCounts v2.0.3 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install featureCounts (part of Subread package)
# conda install -c bioconda subread

# Command for mouse (gencode vM3)
# Replace input_mouse_1.bam input_mouse_2.bam with your actual mouse BAM file(s)
# Replace /path/to/gencode.vM3.annotation.gtf with the actual path to the Gencode vM3 GTF file
featureCounts -a /path/to/gencode.vM3.annotation.gtf \
              -o mouse_gene_counts.txt \
              -F GTF \
              -t exon \
              -g gene_id \
              -s 0 \
              -T 8 \
              input_mouse_1.bam input_mouse_2.bam

# Command for human (gencode v19)
# Replace input_human_1.bam input_human_2.bam with your actual human BAM file(s)
# Replace /path/to/gencode.v19.annotation.gtf with the actual path to the Gencode v19 GTF file
featureCounts -a /path/to/gencode.v19.annotation.gtf \
              -o human_gene_counts.txt \
              -F GTF \
              -t exon \
              -g gene_id \
              -s 0 \
              -T 8 \
              input_human_1.bam input_human_2.bam

Raw Source Text

Sequencing reads from small RNA-seq libraries were first trimmed adapters using cutadapt
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the mm9 or hg19 genome (UCSC assembly) using bowtie version 1.1.1 with parameters -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128
counts of reads for each gene annotated in gencode vM3 were calculated from featureCounts for mouse and v19 for human
Genome_build: mm10/hg19 for mouse and human, respectively
Supplementary_files_format_and_content: count file, csv

← Back to Analysis