GSE102497 Processing Pipeline

miRNA-Seq code_examples 5 steps

Publication

Systematic Discovery of RNA Binding Proteins that Regulate MicroRNA Levels.

Molecular cell (2018) — PMID 29547715

Dataset

GSE102497

Systematic discovery of RNA binding proteins that control microRNA processing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequencing reads from small RNA-seq libraries were first trimmed of adapters using cutadapt

    cutadapt v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt
    
    # Define input and output file names
    INPUT_READS="small_rna_seq_raw.fastq.gz"
    OUTPUT_READS="small_rna_seq_trimmed.fastq.gz"
    
    # Define the 3' adapter sequence (e.g., Illumina TruSeq Small RNA 3' adapter)
    # This adapter sequence might need to be adjusted based on the specific library preparation kit used.
    ADAPTER_SEQUENCE="TGGAATTCTCGGGTGCCAAGGAACTCCAG"
    
    # Trim adapters, filter for minimum length, and perform quality trimming
    # -a: 3' adapter sequence
    # -m: Minimum read length after trimming (e.g., 18 bp for small RNA)
    # -q: Quality cutoff for trimming (e.g., 20)
    # -o: Output file for trimmed reads
    cutadapt \
      -a "${ADAPTER_SEQUENCE}" \
      -m 18 \
      -q 20 \
      -o "${OUTPUT_READS}" \
      "${INPUT_READS}"
  2. 2

    Reads were then mapped against a database of repetitive elements derived from RepBase18.05.

    bowtie2 (Inferred with models/gemini-2.5-flash) v2.5.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install bowtie2 (if not already installed)
    # conda install -c bioconda bowtie2
    
    # Download or prepare RepBase18.05.fasta
    # RepBase requires a license for access. This file would contain the sequences of repetitive elements from RepBase 18.05.
    # Example: wget -O RepBase18.05.fasta "https://www.girinst.org/repbase/update/RepBase18.05.fasta.gz" # (Hypothetical URL, actual access requires license)
    # gunzip RepBase18.05.fasta.gz
    
    # Build Bowtie2 index for repetitive elements
    bowtie2-build RepBase18.05.fasta RepBase18.05_index
    
    # Align reads to repetitive elements
    # Assuming single-end reads and 8 threads. Adjust -U for single-end or -1 -2 for paired-end reads.
    bowtie2 -x RepBase18.05_index -U input_reads.fastq.gz -S aligned_to_repeats.sam -p 8
    
    # Optional: Convert SAM to BAM, sort, and index for further processing or filtering
    # samtools view -bS aligned_to_repeats.sam | samtools sort -o aligned_to_repeats.bam
    # samtools index aligned_to_repeats.bam
  3. 3

    Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).

    Bowtie v1.0.0 GitHub
    $ Bash example
    bash
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie=1.0.0
    
    # Align reads using Bowtie
    # Assuming 'repbase_index' is the base name for the Bowtie index generated from Repbase sequences
    # Assuming 'reads.fastq' is the input FASTQ file
    # Output will be in SAM format due to -S
    bowtie -S -q -p 16 -e 100 -l 20 repbase_index reads.fastq > alignment.sam
    
  4. 4

    Reads not mapped to Repbase sequences were aligned to the hg19 genome (UCSC assembly) using bowtie version 1.1.1 with parameters -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128

    Bowtie v1.1.1 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie=1.1.1
    
    # Define input and output files
    # INPUT_FASTQ represents 'Reads not mapped to Repbase sequences'
    INPUT_FASTQ="unmapped_reads.fastq"
    OUTPUT_SAM="aligned_to_hg19.sam"
    
    # Define the path to the hg19 Bowtie index
    # Replace /path/to/hg19 with the actual path to your Bowtie index files (e.g., if your index files are hg19.1.ebwt, hg19.2.ebwt, etc., then the prefix is hg19).
    BOWTIE_INDEX_PREFIX="/path/to/hg19"
    
    # Align reads to the hg19 genome using Bowtie
    bowtie -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128 "${BOWTIE_INDEX_PREFIX}" "${INPUT_FASTQ}" > "${OUTPUT_SAM}"
  5. 5

    counts of reads for each miRNA were calculated from featureCounts using miRBase annotations

    featureCounts v2.0.6 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install subread (which includes featureCounts)
    # conda install -c bioconda subread=2.0.6
    
    # Download miRBase annotations for Homo sapiens (v22, example)
    # wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3 -O miRBase_hsa.gff3
    
    # Example input BAM files (replace with actual paths)
    # Assuming aligned reads are in 'aligned_bams/' directory
    INPUT_BAMS="aligned_bams/sample1.bam aligned_bams/sample2.bam"
    MIRBASE_GFF="miRBase_hsa.gff3"
    OUTPUT_COUNTS="miRNA_counts.txt"
    
    # Calculate counts of reads for each miRNA using featureCounts
    # -a: Annotation file
    # -F GFF3: Specify GFF3 format
    # -t miRNA: Feature type to count (assuming 'miRNA' is the feature type in the GFF3)
    # -g ID: Attribute used to group features into genes (e.g., miRNA ID)
    # -s 0: Unstranded (0), use 1 for stranded, 2 for reverse stranded if known
    # -T 8: Use 8 threads
    # -o: Output file
    featureCounts -a "${MIRBASE_GFF}" -F GFF3 -t miRNA -g ID -s 0 -T 8 -o "${OUTPUT_COUNTS}" ${INPUT_BAMS}
Raw Source Text
Sequencing reads from small RNA-seq libraries were first trimmed of adapters using cutadapt
Reads were then mapped against a database of repetitive elements derived from RepBase18.05. Bowtie version 1.0.0 with parameters -S -q -p 16 -e 100 -l 20 was used to align reads against an index generated from Repbase sequences (Langmead et al., 2009).
Reads not mapped to Repbase sequences were aligned to the hg19 genome (UCSC assembly) using bowtie version 1.1.1 with parameters -p 8 -k 1 -m 10 -l 25 --best --chunkmbs 128
counts of reads for each miRNA were calculated from featureCounts using miRBase annotations
Genome_build: hg19
Supplementary_files_format_and_content: count file, txt
← Back to Analysis