GSE262542 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Evaluation of novel computational methods to identify RNA-binding protein footprints from structural data.

RNA (New York, N.Y.) (2025) — PMID 40399037

Dataset

GSE262542

Evaluation of novel computational methods that identify RNA-binding protein footprints from structural data

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    data processing was done using the Skipper pipeline, freelly available at https://github.com/yeolab/skipper.

    Skipper vNot specified GitHub
    $ Bash example
    # Clone the Skipper pipeline repository
    git clone https://github.com/yeolab/skipper.git
    cd skipper
    
    # The Skipper pipeline is a Snakemake workflow that requires a 'config.yaml' file
    # and input data (e.g., FASTQ files) to be present in the working directory.
    # The 'config.yaml' typically specifies parameters such as the reference genome (e.g., hg38 for human eCLIP).
    # Users need to prepare these files according to the pipeline's documentation.
    
    # Execute the Skipper pipeline using Snakemake.
    # The --cores flag specifies the number of CPU cores to use.
    # The --use-conda flag enables Snakemake to manage software environments via Conda.
    snakemake --cores 8 --use-conda
  2. 2

    Adapters trimming was done with Skewer

    Skewer v0.2.2 GitHub
    $ Bash example
    # Install Skewer (example using conda)
    # conda install -c bioconda skewer
    
    # Define input and output file names
    INPUT_FASTQ="input.fastq.gz"
    OUTPUT_PREFIX="trimmed_reads"
    ADAPTER_FILE="adapters.fa" # Placeholder for adapter sequences file (e.g., containing Illumina adapters)
    
    # Execute Skewer for adapter trimming
    # Parameters are based on common eCLIP settings (e.g., from yeolab/eclip workflow)
    # -x: Adapter sequences file
    # -l: Minimum read length after trimming (default 18 in eclip workflow)
    # -q: Minimum quality score to trim (default 20 in eclip workflow)
    # -m: Minimum overlap length for adapter detection (default 1 in eclip workflow)
    # -o: Output file prefix
    skewer -x "${ADAPTER_FILE}" -l 18 -q 20 -m 1 -o "${OUTPUT_PREFIX}" "${INPUT_FASTQ}"
  3. 3

    proccessed reads were mapped with STAR (2.7.10a_alpha_220314)

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Placeholder for reference genome directory
    # Replace with your actual STAR genome index path (e.g., for hg38)
    GENOME_DIR="/path/to/STAR_genome_index/hg38"
    
    # Placeholder for input FASTQ files
    # Replace with your actual input FASTQ files (e.g., processed reads)
    # Assuming paired-end reads, adjust for single-end if necessary
    READS_R1="processed_reads_R1.fastq.gz"
    READS_R2="processed_reads_R2.fastq.gz"
    
    # Placeholder for output directory and prefix
    OUTPUT_DIR="star_mapping_output"
    OUTPUT_PREFIX="${OUTPUT_DIR}/aligned_reads"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run STAR alignment
    # Common parameters for RNA-seq alignment
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS_R1}" "${READS_R2}" \
         --runThreadN 8 \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverLmax 0.1 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --readFilesCommand zcat
  4. 4

    PCR bias was removed using UMIcollapse

    UMIcollapse vN/A
    $ Bash example
    # UMIcollapse is a Python script. Ensure Python is installed.
    # Download the script (if not already available in your environment):
    # wget https://raw.githubusercontent.com/MikeDacre/UMIcollapse/master/UMIcollapse.py
    # chmod +x UMIcollapse.py
    
    # Example usage:
    # Replace input.bam with your actual input BAM file containing UMIs.
    # Replace output.bam with your desired output deduplicated BAM file.
    python UMIcollapse.py -i input.bam -o output.bam

Tools Used

Raw Source Text
data processing was done using the Skipper pipeline, freelly available at https://github.com/yeolab/skipper.
Adapters trimming was done with Skewer
proccessed reads were mapped with STAR (2.7.10a_alpha_220314)
PCR bias was removed using UMIcollapse
Assembly: hg38
Supplementary files format and content: tab separated values files
Supplementary files format and content: Supplementary files - reproducible enriched window including p and q values, enrichment scores, and the annotated regions for the significantly bound transcriptome tiled windows.
← Back to Analysis