GSE262542 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Evaluation of novel computational methods to identify RNA-binding protein footprints from structural data.

RNA (New York, N.Y.) (2025) — PMID 40399037

Dataset

Evaluation of novel computational methods that identify RNA-binding protein footprints from structural data

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

data processing was done using the Skipper pipeline, freelly available at https://github.com/yeolab/skipper.

Skipper vNot specified GitHub

$ Bash example

# Clone the Skipper pipeline repository
git clone https://github.com/yeolab/skipper.git
cd skipper

# The Skipper pipeline is a Snakemake workflow that requires a 'config.yaml' file
# and input data (e.g., FASTQ files) to be present in the working directory.
# The 'config.yaml' typically specifies parameters such as the reference genome (e.g., hg38 for human eCLIP).
# Users need to prepare these files according to the pipeline's documentation.

# Execute the Skipper pipeline using Snakemake.
# The --cores flag specifies the number of CPU cores to use.
# The --use-conda flag enables Snakemake to manage software environments via Conda.
snakemake --cores 8 --use-conda

View on GitHub

Adapters trimming was done with Skewer

Skewer v0.2.2 GitHub

$ Bash example

# Install Skewer (example using conda)
# conda install -c bioconda skewer

# Define input and output file names
INPUT_FASTQ="input.fastq.gz"
OUTPUT_PREFIX="trimmed_reads"
ADAPTER_FILE="adapters.fa" # Placeholder for adapter sequences file (e.g., containing Illumina adapters)

# Execute Skewer for adapter trimming
# Parameters are based on common eCLIP settings (e.g., from yeolab/eclip workflow)
# -x: Adapter sequences file
# -l: Minimum read length after trimming (default 18 in eclip workflow)
# -q: Minimum quality score to trim (default 20 in eclip workflow)
# -m: Minimum overlap length for adapter detection (default 1 in eclip workflow)
# -o: Output file prefix
skewer -x "${ADAPTER_FILE}" -l 18 -q 20 -m 1 -o "${OUTPUT_PREFIX}" "${INPUT_FASTQ}"

View on GitHub

proccessed reads were mapped with STAR (2.7.10a_alpha_220314)

STAR v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Placeholder for reference genome directory
# Replace with your actual STAR genome index path (e.g., for hg38)
GENOME_DIR="/path/to/STAR_genome_index/hg38"

# Placeholder for input FASTQ files
# Replace with your actual input FASTQ files (e.g., processed reads)
# Assuming paired-end reads, adjust for single-end if necessary
READS_R1="processed_reads_R1.fastq.gz"
READS_R2="processed_reads_R2.fastq.gz"

# Placeholder for output directory and prefix
OUTPUT_DIR="star_mapping_output"
OUTPUT_PREFIX="${OUTPUT_DIR}/aligned_reads"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run STAR alignment
# Common parameters for RNA-seq alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_R1}" "${READS_R2}" \
     --runThreadN 8 \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 999 \
     --outFilterMismatchNoverLmax 0.1 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --readFilesCommand zcat

View on GitHub

PCR bias was removed using UMIcollapse

UMIcollapse vN/A

$ Bash example

# UMIcollapse is a Python script. Ensure Python is installed.
# Download the script (if not already available in your environment):
# wget https://raw.githubusercontent.com/MikeDacre/UMIcollapse/master/UMIcollapse.py
# chmod +x UMIcollapse.py

# Example usage:
# Replace input.bam with your actual input BAM file containing UMIs.
# Replace output.bam with your desired output deduplicated BAM file.
python UMIcollapse.py -i input.bam -o output.bam

Tools Used

Skipper STAR

Raw Source Text

data processing was done using the Skipper pipeline, freelly available at https://github.com/yeolab/skipper.
Adapters trimming was done with Skewer
proccessed reads were mapped with STAR (2.7.10a_alpha_220314)
PCR bias was removed using UMIcollapse
Assembly: hg38
Supplementary files format and content: tab separated values files
Supplementary files format and content: Supplementary files - reproducible enriched window including p and q values, enrichment scores, and the annotated regions for the significantly bound transcriptome tiled windows.

← Back to Analysis