GSE72420 Processing Pipeline

GSE code_examples 3 steps

Publication

The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression.

Science (New York, N.Y.) (2015) — PMID 26382853

Dataset

The Ro60 Autoantigen Binds Endogenous Retroelements and Regulates Inflammatory Gene Expression

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina software used for basecalling.

bcl2fastq (Inferred with models/gemini-2.5-flash) v2.20 GitHub

$ Bash example

# Install bcl2fastq (example using conda)
# conda install -c bioconda bcl2fastq2

# Example bcl2fastq command for basecalling.
# Replace /path/to/runfolder with the actual path to your Illumina run folder (containing BCL files).
# Replace /path/to/output_fastq with the desired output directory for FASTQ files.
# No specific parameters were inferred from the description, so common defaults are used.
bcl2fastq --runfolder-dir /path/to/runfolder --output-dir /path/to/output_fastq --no-lane-splitting --barcode-mismatches 1

View on GitHub

Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence.

STAR vInferred (specific version not provided, but the Google Code link suggests an older release) GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Placeholder for STAR genome index directory for hg19
# Replace /path/to/hg19_STAR_index with the actual path to your STAR index.
# If the index is not built, you would first run:
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/hg19_STAR_index \
#      --genomeFastaFiles /path/to/hg19.fa \
#      --sjdbGTFfile /path/to/hg19.gtf \
#      --sjdbOverhang 100 \
#      --runThreadN <num_threads>

# Map reads to hg19 using STAR with specified parameters
# Replace input_R1.fastq.gz and input_R2.fastq.gz with your actual input FASTQ files.
# Replace output_prefix with your desired output file prefix.
# Adjust --runThreadN based on available CPU cores.
STAR --genomeDir /path/to/hg19_STAR_index \
     --readFilesIn input_R1.fastq.gz input_R2.fastq.gz \
     --outFileNamePrefix output_prefix \
     --outFilterMultimapNmax 20 \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes All \
     --runThreadN 8

View on GitHub

Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file.

RefSeq vv1.0 GitHub

$ Bash example

# Install Pyicoteo (which includes pyicoclip)
# pip install pyicoteo

# Placeholder for RefSeq genes BED file (e.g., for hg38).
# This file would typically be pre-generated or downloaded from a resource like UCSC Table Browser.
# Example for hg38 refGene (convert to BED format):
# wget -O refGene.txt.gz "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz"
# gunzip refGene.txt.gz
# awk 'BEGIN{OFS="\t"} {print $3, $5, $6, $2, $13, $4}' refGene.txt | sort -k1,1 -k2,2n > refseq_genes.bed

# Define input BAM files (e.g., IP sample and control sample)
# Replace with actual paths to your aligned BAM files
IP_BAM="path/to/your/ip_sample.bam"
CONTROL_BAM="path/to/your/control_sample.bam"

# Define the RefSeq genes region file
REFSEQ_REGIONS="path/to/your/refseq_genes.bed" # e.g., the file generated above

# Define output prefix for pyicoclip results
OUTPUT_PREFIX="pyicoclip_peaks"

# Execute pyicoclip for peak calling
pyicoclip -i "${IP_BAM}" -c "${CONTROL_BAM}" -r "${REFSEQ_REGIONS}" -o "${OUTPUT_PREFIX}"

View on GitHub

Tools Used

STAR

Raw Source Text

Illumina software used for basecalling.
Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence. Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file.
Genome_build: GRCh37 (hg19)
Supplementary_files_format_and_content: Bed files include peaks.

← Back to Analysis