GSE249247 Processing Pipeline

GSE code_examples 5 steps

Publication

Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.

Molecular cell (2024) — PMID 39303722

Dataset

GSE249247

Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation.

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    *library strategy: Cut&Run

    $ Bash example
    # Install Nextflow if not already installed
    # conda install -c bioconda nextflow
    
    # Create a placeholder input JSON file for the ENCODE ChIP-seq pipeline.
    # This example assumes paired-end reads for a single ChIP replicate and a single control replicate.
    # Replace 'sample_chip_R1.fastq.gz', 'sample_chip_R2.fastq.gz', etc., with your actual file paths.
    # For Cut&Run, the 'control' might be an IgG sample or a spike-in control, depending on the experimental design.
    cat << EOF > input.json
    {
      "chip": [
        {
          "replicate_name": "chip_rep1",
          "fastq_rep1_R1": "sample_chip_R1.fastq.gz",
          "fastq_rep1_R2": "sample_chip_R2.fastq.gz"
        }
      ],
      "control": [
        {
          "replicate_name": "control_rep1",
          "fastq_rep1_R1": "sample_control_R1.fastq.gz",
          "fastq_rep1_R2": "sample_control_R2.fastq.gz"
        }
      ]
    }
    EOF
    
    # Run the ENCODE DCC ChIP-seq pipeline using Nextflow.
    # The pipeline will automatically handle alignment, peak calling, and quality control.
    # '--genome hg38' specifies the human reference genome (GRCh38/hg38) as a placeholder.
    # You may need to specify a different genome (e.g., mm10 for mouse) or provide a custom genome configuration.
    # '-profile docker' uses Docker containers for reproducibility; ensure Docker is installed and running.
    # '--outdir results' specifies the output directory.
    nextflow run ENCODE-DCC/chip-seq-pipeline2 -profile docker --input input.json --genome hg38 --outdir results
  2. 2

    Data analysis was performed using a modified version of CUT&RUNTools 2.0.

    CUT&RUNTools v2.0
    $ Bash example
    # Example installation (assuming Python environment and dependencies like numpy, scipy, pysam, deeptools, macs2 are met)
    # git clone https://github.com/yezhengwen/CUT-RUNTools.git
    # cd CUT-RUNTools
    
    # Placeholder for input files and genome reference
    TREATMENT_BAM="path/to/treatment.bam" # Aligned BAM file for treatment sample
    CONTROL_BAM="path/to/control.bam"   # Aligned BAM file for control sample (e.g., IgG or input)
    GENOME_FASTA="path/to/hg38.fa"       # Reference genome FASTA file (e.g., from UCSC or Ensembl)
    OUTPUT_DIR="cutruntools_analysis_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute CUT&RUNTools 2.0
    # Note: The description mentions a "modified version". This command represents the standard usage of CUT&RUNTools 2.0.
    # Specific modifications would need to be applied to the script itself or reflected in additional parameters if applicable.
    python CUT-RUNTools.py \
        -t "${TREATMENT_BAM}" \
        -c "${CONTROL_BAM}" \
        -g "${GENOME_FASTA}" \
        -o "${OUTPUT_DIR}" \
        # Add other parameters as needed, e.g., for species, peak caller, fragment size, etc.
        # -s hg38 \ # Specify species (e.g., hg38, mm10)
        # -p macs2 \ # Specify peak caller (e.g., macs2, seacr)
        # -f 120   # Specify fragment size (if known, otherwise auto-detected)
    
  3. 3

    Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through.

    Trimmomatic v0.36
    $ Bash example
    # Install Trimmomatic (if not already installed)
    # conda install -c bioconda trimmomatic=0.36
    
    # Define input and output file names
    # Assuming paired-end reads based on "fragment read-through"
    READ1_IN="input_R1.fastq.gz"
    READ2_IN="input_R2.fastq.gz"
    READ1_PAIRED_OUT="output_R1_paired.fastq.gz"
    READ1_UNPAIRED_OUT="output_R1_unpaired.fastq.gz"
    READ2_PAIRED_OUT="output_R2_paired.fastq.gz"
    READ2_UNPAIRED_OUT="output_R2_unpaired.fastq.gz"
    
    # Define adapter file path (replace with actual path to your adapter file)
    # Trimmomatic comes with adapter files in its 'adapters' directory, e.g., TruSeq3-PE.fa
    # The description implies a thorough adapter trimming, including read-through sequences.
    # This is handled by the ILLUMINACLIP step.
    ADAPTER_FILE="/path/to/Trimmomatic-0.36/adapters/TruSeq3-PE.fa"
    
    # Execute Trimmomatic for adapter trimming and basic quality filtering.
    # The ILLUMINACLIP step handles both initial adapter removal and the removal of
    # adapter overhangs due to fragment read-through (especially with the palindrome threshold).
    # Common parameters for ILLUMINACLIP are 2:30:10 (seed mismatches:palindrome_clip_threshold:simple_clip_threshold).
    # Other common quality trimming steps (LEADING, TRAILING, SLIDINGWINDOW, MINLEN) are included as typical usage.
    java -jar /path/to/trimmomatic-0.36.jar PE \
        "${READ1_IN}" "${READ2_IN}" \
        "${READ1_PAIRED_OUT}" "${READ1_UNPAIRED_OUT}" \
        "${READ2_PAIRED_OUT}" "${READ2_UNPAIRED_OUT}" \
        ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \
        LEADING:3 \
        TRAILING:3 \
        SLIDINGWINDOW:4:15 \
        MINLEN:36
  4. 4

    Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset “--very-sensitive-local,” minimum fragment length 10, and maximum fragment length 700.

    $ Bash example
    # Install bowtie2 (if not already installed)
    # conda install -c bioconda bowtie2
    
    # Install samtools (if not already installed, for converting SAM to BAM)
    # conda install -c bioconda samtools
    
    # Define reference genome index path
    # Replace with the actual path to your hg38 bowtie2 index files
    HG38_INDEX="path/to/hg38_bowtie2_index/hg38"
    
    # Define input FASTQ files (assuming paired-end reads based on fragment length parameters)
    # Replace with your actual input read files
    READS_R1="input_R1.fastq.gz"
    READS_R2="input_R2.fastq.gz"
    
    # Define output BAM file
    OUTPUT_BAM="aligned.bam"
    
    # Align reads to hg38 using bowtie2
    # --very-sensitive-local: preset for sensitive local alignment
    # -I 10: minimum fragment length 10
    # -X 700: maximum fragment length 700
    # -x ${HG38_INDEX}: path to the reference genome index
    # -1 ${READS_R1}: first mate reads
    # -2 ${READS_R2}: second mate reads
    # -S - : output SAM to stdout
    # | samtools view -bS - : pipe SAM from stdout to samtools to convert to BAM
    # > ${OUTPUT_BAM}: redirect samtools output to the BAM file
    bowtie2 --very-sensitive-local -I 10 -X 700 -x "${HG38_INDEX}" -1 "${READS_R1}" -2 "${READS_R2}" -S - | samtools view -bS - > "${OUTPUT_BAM}"
  5. 5

    After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control.

    MACS2 v2.2.7
    $ Bash example
    # Install MACS2 (if not already installed)
    # conda install -c bioconda macs2
    
    # Define input files and output prefix
    TREATMENT_BAM="treatment.bam" # Replace with your actual treatment BAM file (after PCR duplicate removal)
    CONTROL_BAM="control.bam"     # Replace with your actual control BAM file (V5 mock IP, after PCR duplicate removal)
    OUTPUT_PREFIX="my_experiment_macs2_peaks"
    OUTPUT_DIR="macs2_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run MACS2 callpeak with default narrowPeak settings
    # -t: Treatment file (IP sample)
    # -c: Control file (V5 mock IP sample)
    # -f BAM: Input file format (BAM for aligned reads). Assumes single-end reads; use BAMPE for paired-end.
    # -g hs: Effective genome size (e.g., 'hs' for human, 'mm' for mouse). Adjust if needed for your organism.
    # -n: Experiment name, used as prefix for output files
    # --outdir: Specify output directory
    macs2 callpeak \
      -t "${TREATMENT_BAM}" \
      -c "${CONTROL_BAM}" \
      -f BAM \
      -g hs \
      -n "${OUTPUT_PREFIX}" \
      --outdir "${OUTPUT_DIR}"

Tools Used

Raw Source Text
*library strategy: Cut&Run
Data analysis was performed using a modified version of CUT&RUNTools 2.0. Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through. Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset “--very-sensitive-local,” minimum fragment length 10, and maximum fragment length 700. After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control.
Assembly: hg38
Supplementary files format and content: narrowPeak files (q<0.05) output by MACS2
← Back to Analysis