GSE72500 Processing Pipeline

RIP-Seq code_examples 3 steps

Publication

The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression.

Science (New York, N.Y.) (2015) — PMID 26382853

Dataset

GSE72500

Ro60 iCLIP

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Illumina software used for basecalling.

    bcl2fastq (Inferred with models/gemini-2.5-flash) v2.20 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install bcl2fastq (example using conda)
    # conda install -c bioconda bcl2fastq2
    
    # Define input and output directories
    RUN_FOLDER_DIR="/path/to/illumina/run/folder"
    OUTPUT_DIR="/path/to/output/fastq"
    SAMPLE_SHEET="/path/to/sample_sheet.csv" # Optional, but highly recommended for demultiplexing
    
    # Execute bcl2fastq for basecalling and demultiplexing
    bcl2fastq --runfolder-dir "${RUN_FOLDER_DIR}" \
                      --output-dir "${OUTPUT_DIR}" \
                      --sample-sheet "${SAMPLE_SHEET}" \
                      --no-lane-splitting # Example common parameter, adjust as needed
  2. 2

    Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence.

    STAR v2.4.x (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    GENOME_DIR="/path/to/STAR_index/hg19" # Path to the STAR genome index for hg19
    READS="input.fastq.gz" # Input FASTQ file (assuming single-end for this example)
    OUTPUT_PREFIX="mapped_reads" # Prefix for output files
    
    # 1. Map reads to human genome build hg19 using STAR
    #    Parameters: "outFilterMultimapNmax 20"
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}." \
         --outFilterMultimapNmax 20 \
         --outSAMtype BAM SortedByCoordinate \
         --runThreadN 8 # Adjust number of threads as needed
    
    # 2. Remove PCR duplicates using unique nmers in the barcode sequence
    #    This step typically involves UMI (Unique Molecular Identifier) based deduplication.
    #    The exact command depends on how the barcode sequence is incorporated into the reads
    #    (e.g., in the read header, or at the start of the read sequence).
    #    A common tool for this is `umi_tools dedup`.
    
    # Install umi_tools (if not already installed)
    # conda install -c bioconda umi_tools
    
    # Example using umi_tools dedup (assuming UMI is in the read name after a previous extraction step):
    # umi_tools dedup \
    #     --input "${OUTPUT_PREFIX}.Aligned.sortedByCoord.out.bam" \
    #     --output "${OUTPUT_PREFIX}.deduplicated.bam" \
    #     --extract-method=read_id \
    #     --umi-separator=":" \
    #     --log "${OUTPUT_PREFIX}.deduplication.log"
    # If reads are paired-end, add --paired. If UMI needs to be extracted from the read sequence first, use `umi_tools extract` prior to STAR.
  3. 3

    Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file.

    RefSeq vv0.1.1
    $ Bash example
    # Install pyicoteo (which includes pyicoclip)
    # pip install pyicoteo
    
    # Placeholder for input BAM file (aligned reads)
    # Replace with your actual input BAM file
    INPUT_BAM="input.bam"
    
    # Placeholder for RefSeq genes region file (e.g., BED format)
    # This file defines the regions where peaks will be called.
    # Example: Download RefSeq genes for your specific genome assembly (e.g., hg38) 
    # from resources like UCSC Table Browser, Ensembl, or NCBI.
    REFSEQ_GENES_BED="refseq_genes.bed"
    
    # Placeholder for genome FASTA file
    # Replace with your actual genome FASTA file (e.g., hg38.fa)
    GENOME_FASTA="genome.fa"
    
    # Output prefix for peak files
    OUTPUT_PREFIX="pyicoclip_peaks"
    
    # Execute pyicoclip (part of the pyicoteo package)
    pyicoteo clip -i "${INPUT_BAM}" -o "${OUTPUT_PREFIX}" -r "${REFSEQ_GENES_BED}" -g "${GENOME_FASTA}"

Tools Used

Raw Source Text
Illumina software used for basecalling.
Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence. Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file.
Genome_build: GRCh37 (hg19)
Supplementary_files_format_and_content: Bed files include peaks.
← Back to Analysis