GSE249247 Processing Pipeline
Publication
Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.Molecular cell (2024) — PMID 39303722
Dataset
GSE249247Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation.
Processing Steps
Generate Jupyter Notebook-
1
*library strategy: Cut&Run
$ Bash example
# Install Nextflow if not already installed # conda install -c bioconda nextflow # Create a placeholder input JSON file for the ENCODE ChIP-seq pipeline. # This example assumes paired-end reads for a single ChIP replicate and a single control replicate. # Replace 'sample_chip_R1.fastq.gz', 'sample_chip_R2.fastq.gz', etc., with your actual file paths. # For Cut&Run, the 'control' might be an IgG sample or a spike-in control, depending on the experimental design. cat << EOF > input.json { "chip": [ { "replicate_name": "chip_rep1", "fastq_rep1_R1": "sample_chip_R1.fastq.gz", "fastq_rep1_R2": "sample_chip_R2.fastq.gz" } ], "control": [ { "replicate_name": "control_rep1", "fastq_rep1_R1": "sample_control_R1.fastq.gz", "fastq_rep1_R2": "sample_control_R2.fastq.gz" } ] } EOF # Run the ENCODE DCC ChIP-seq pipeline using Nextflow. # The pipeline will automatically handle alignment, peak calling, and quality control. # '--genome hg38' specifies the human reference genome (GRCh38/hg38) as a placeholder. # You may need to specify a different genome (e.g., mm10 for mouse) or provide a custom genome configuration. # '-profile docker' uses Docker containers for reproducibility; ensure Docker is installed and running. # '--outdir results' specifies the output directory. nextflow run ENCODE-DCC/chip-seq-pipeline2 -profile docker --input input.json --genome hg38 --outdir results -
2
Data analysis was performed using a modified version of CUT&RUNTools 2.0.
CUT&RUNTools v2.0$ Bash example
# Example installation (assuming Python environment and dependencies like numpy, scipy, pysam, deeptools, macs2 are met) # git clone https://github.com/yezhengwen/CUT-RUNTools.git # cd CUT-RUNTools # Placeholder for input files and genome reference TREATMENT_BAM="path/to/treatment.bam" # Aligned BAM file for treatment sample CONTROL_BAM="path/to/control.bam" # Aligned BAM file for control sample (e.g., IgG or input) GENOME_FASTA="path/to/hg38.fa" # Reference genome FASTA file (e.g., from UCSC or Ensembl) OUTPUT_DIR="cutruntools_analysis_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Execute CUT&RUNTools 2.0 # Note: The description mentions a "modified version". This command represents the standard usage of CUT&RUNTools 2.0. # Specific modifications would need to be applied to the script itself or reflected in additional parameters if applicable. python CUT-RUNTools.py \ -t "${TREATMENT_BAM}" \ -c "${CONTROL_BAM}" \ -g "${GENOME_FASTA}" \ -o "${OUTPUT_DIR}" \ # Add other parameters as needed, e.g., for species, peak caller, fragment size, etc. # -s hg38 \ # Specify species (e.g., hg38, mm10) # -p macs2 \ # Specify peak caller (e.g., macs2, seacr) # -f 120 # Specify fragment size (if known, otherwise auto-detected) -
3
Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through.
Trimmomatic v0.36$ Bash example
# Install Trimmomatic (if not already installed) # conda install -c bioconda trimmomatic=0.36 # Define input and output file names # Assuming paired-end reads based on "fragment read-through" READ1_IN="input_R1.fastq.gz" READ2_IN="input_R2.fastq.gz" READ1_PAIRED_OUT="output_R1_paired.fastq.gz" READ1_UNPAIRED_OUT="output_R1_unpaired.fastq.gz" READ2_PAIRED_OUT="output_R2_paired.fastq.gz" READ2_UNPAIRED_OUT="output_R2_unpaired.fastq.gz" # Define adapter file path (replace with actual path to your adapter file) # Trimmomatic comes with adapter files in its 'adapters' directory, e.g., TruSeq3-PE.fa # The description implies a thorough adapter trimming, including read-through sequences. # This is handled by the ILLUMINACLIP step. ADAPTER_FILE="/path/to/Trimmomatic-0.36/adapters/TruSeq3-PE.fa" # Execute Trimmomatic for adapter trimming and basic quality filtering. # The ILLUMINACLIP step handles both initial adapter removal and the removal of # adapter overhangs due to fragment read-through (especially with the palindrome threshold). # Common parameters for ILLUMINACLIP are 2:30:10 (seed mismatches:palindrome_clip_threshold:simple_clip_threshold). # Other common quality trimming steps (LEADING, TRAILING, SLIDINGWINDOW, MINLEN) are included as typical usage. java -jar /path/to/trimmomatic-0.36.jar PE \ "${READ1_IN}" "${READ2_IN}" \ "${READ1_PAIRED_OUT}" "${READ1_UNPAIRED_OUT}" \ "${READ2_PAIRED_OUT}" "${READ2_UNPAIRED_OUT}" \ ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \ LEADING:3 \ TRAILING:3 \ SLIDINGWINDOW:4:15 \ MINLEN:36 -
4
Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset â--very-sensitive-local,â minimum fragment length 10, and maximum fragment length 700.
$ Bash example
# Install bowtie2 (if not already installed) # conda install -c bioconda bowtie2 # Install samtools (if not already installed, for converting SAM to BAM) # conda install -c bioconda samtools # Define reference genome index path # Replace with the actual path to your hg38 bowtie2 index files HG38_INDEX="path/to/hg38_bowtie2_index/hg38" # Define input FASTQ files (assuming paired-end reads based on fragment length parameters) # Replace with your actual input read files READS_R1="input_R1.fastq.gz" READS_R2="input_R2.fastq.gz" # Define output BAM file OUTPUT_BAM="aligned.bam" # Align reads to hg38 using bowtie2 # --very-sensitive-local: preset for sensitive local alignment # -I 10: minimum fragment length 10 # -X 700: maximum fragment length 700 # -x ${HG38_INDEX}: path to the reference genome index # -1 ${READS_R1}: first mate reads # -2 ${READS_R2}: second mate reads # -S - : output SAM to stdout # | samtools view -bS - : pipe SAM from stdout to samtools to convert to BAM # > ${OUTPUT_BAM}: redirect samtools output to the BAM file bowtie2 --very-sensitive-local -I 10 -X 700 -x "${HG38_INDEX}" -1 "${READS_R1}" -2 "${READS_R2}" -S - | samtools view -bS - > "${OUTPUT_BAM}" -
5
After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control.
MACS2 v2.2.7$ Bash example
# Install MACS2 (if not already installed) # conda install -c bioconda macs2 # Define input files and output prefix TREATMENT_BAM="treatment.bam" # Replace with your actual treatment BAM file (after PCR duplicate removal) CONTROL_BAM="control.bam" # Replace with your actual control BAM file (V5 mock IP, after PCR duplicate removal) OUTPUT_PREFIX="my_experiment_macs2_peaks" OUTPUT_DIR="macs2_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run MACS2 callpeak with default narrowPeak settings # -t: Treatment file (IP sample) # -c: Control file (V5 mock IP sample) # -f BAM: Input file format (BAM for aligned reads). Assumes single-end reads; use BAMPE for paired-end. # -g hs: Effective genome size (e.g., 'hs' for human, 'mm' for mouse). Adjust if needed for your organism. # -n: Experiment name, used as prefix for output files # --outdir: Specify output directory macs2 callpeak \ -t "${TREATMENT_BAM}" \ -c "${CONTROL_BAM}" \ -f BAM \ -g hs \ -n "${OUTPUT_PREFIX}" \ --outdir "${OUTPUT_DIR}"
Raw Source Text
*library strategy: Cut&Run Data analysis was performed using a modified version of CUT&RUNTools 2.0. Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through. Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset â--very-sensitive-local,â minimum fragment length 10, and maximum fragment length 700. After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control. Assembly: hg38 Supplementary files format and content: narrowPeak files (q<0.05) output by MACS2