GSE72420 Processing Pipeline
GSE
code_examples
3 steps
Publication
The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression.Science (New York, N.Y.) (2015) — PMID 26382853
Dataset
GSE72420The Ro60 Autoantigen Binds Endogenous Retroelements and Regulates Inflammatory Gene Expression
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illumina software used for basecalling.
$ Bash example
# Install bcl2fastq (example using conda) # conda install -c bioconda bcl2fastq2 # Example bcl2fastq command for basecalling. # Replace /path/to/runfolder with the actual path to your Illumina run folder (containing BCL files). # Replace /path/to/output_fastq with the desired output directory for FASTQ files. # No specific parameters were inferred from the description, so common defaults are used. bcl2fastq --runfolder-dir /path/to/runfolder --output-dir /path/to/output_fastq --no-lane-splitting --barcode-mismatches 1
-
2
Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence.
STAR vInferred (specific version not provided, but the Google Code link suggests an older release) GitHub$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Placeholder for STAR genome index directory for hg19 # Replace /path/to/hg19_STAR_index with the actual path to your STAR index. # If the index is not built, you would first run: # STAR --runMode genomeGenerate \ # --genomeDir /path/to/hg19_STAR_index \ # --genomeFastaFiles /path/to/hg19.fa \ # --sjdbGTFfile /path/to/hg19.gtf \ # --sjdbOverhang 100 \ # --runThreadN <num_threads> # Map reads to hg19 using STAR with specified parameters # Replace input_R1.fastq.gz and input_R2.fastq.gz with your actual input FASTQ files. # Replace output_prefix with your desired output file prefix. # Adjust --runThreadN based on available CPU cores. STAR --genomeDir /path/to/hg19_STAR_index \ --readFilesIn input_R1.fastq.gz input_R2.fastq.gz \ --outFileNamePrefix output_prefix \ --outFilterMultimapNmax 20 \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes All \ --runThreadN 8 -
3
Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file.
$ Bash example
# Install Pyicoteo (which includes pyicoclip) # pip install pyicoteo # Placeholder for RefSeq genes BED file (e.g., for hg38). # This file would typically be pre-generated or downloaded from a resource like UCSC Table Browser. # Example for hg38 refGene (convert to BED format): # wget -O refGene.txt.gz "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz" # gunzip refGene.txt.gz # awk 'BEGIN{OFS="\t"} {print $3, $5, $6, $2, $13, $4}' refGene.txt | sort -k1,1 -k2,2n > refseq_genes.bed # Define input BAM files (e.g., IP sample and control sample) # Replace with actual paths to your aligned BAM files IP_BAM="path/to/your/ip_sample.bam" CONTROL_BAM="path/to/your/control_sample.bam" # Define the RefSeq genes region file REFSEQ_REGIONS="path/to/your/refseq_genes.bed" # e.g., the file generated above # Define output prefix for pyicoclip results OUTPUT_PREFIX="pyicoclip_peaks" # Execute pyicoclip for peak calling pyicoclip -i "${IP_BAM}" -c "${CONTROL_BAM}" -r "${REFSEQ_REGIONS}" -o "${OUTPUT_PREFIX}"
Tools Used
Raw Source Text
Illumina software used for basecalling. Reads were mapped to human genome build hg19 using STAR (https://code.google.com/p/rna-star/) with the "outFilterMultimapNmax 20" option, then PCR duplicates were removed using unique nmers in the barcode sequence. Peak calling was performed using pyicoclip (http://regulatorygenomics.upf.edu/Software/Pyicoteo/pyicoclip.html) using RefSeq genes as the region file. Genome_build: GRCh37 (hg19) Supplementary_files_format_and_content: Bed files include peaks.