GSE249247 Processing Pipeline

GSE code_examples 5 steps

Publication

Integrated multi-omics analysis of zinc-finger proteins uncovers roles in RNA regulation.

Molecular cell (2024) — PMID 39303722

Dataset

Integrated multi-omics analysis of zinc finger proteins uncovers roles in RNA regulation.

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

*library strategy: Cut&Run

ChIP-seq vv2.0.0 GitHub

$ Bash example

# Install Nextflow if not already installed
# conda install -c bioconda nextflow

# Create a placeholder input JSON file for the ENCODE ChIP-seq pipeline.
# This example assumes paired-end reads for a single ChIP replicate and a single control replicate.
# Replace 'sample_chip_R1.fastq.gz', 'sample_chip_R2.fastq.gz', etc., with your actual file paths.
# For Cut&Run, the 'control' might be an IgG sample or a spike-in control, depending on the experimental design.
cat << EOF > input.json
{
  "chip": [
    {
      "replicate_name": "chip_rep1",
      "fastq_rep1_R1": "sample_chip_R1.fastq.gz",
      "fastq_rep1_R2": "sample_chip_R2.fastq.gz"
    }
  ],
  "control": [
    {
      "replicate_name": "control_rep1",
      "fastq_rep1_R1": "sample_control_R1.fastq.gz",
      "fastq_rep1_R2": "sample_control_R2.fastq.gz"
    }
  ]
}
EOF

# Run the ENCODE DCC ChIP-seq pipeline using Nextflow.
# The pipeline will automatically handle alignment, peak calling, and quality control.
# '--genome hg38' specifies the human reference genome (GRCh38/hg38) as a placeholder.
# You may need to specify a different genome (e.g., mm10 for mouse) or provide a custom genome configuration.
# '-profile docker' uses Docker containers for reproducibility; ensure Docker is installed and running.
# '--outdir results' specifies the output directory.
nextflow run ENCODE-DCC/chip-seq-pipeline2 -profile docker --input input.json --genome hg38 --outdir results

View on GitHub

Data analysis was performed using a modified version of CUT&RUNTools 2.0.

CUT&RUNTools v2.0

$ Bash example

# Example installation (assuming Python environment and dependencies like numpy, scipy, pysam, deeptools, macs2 are met)
# git clone https://github.com/yezhengwen/CUT-RUNTools.git
# cd CUT-RUNTools

# Placeholder for input files and genome reference
TREATMENT_BAM="path/to/treatment.bam" # Aligned BAM file for treatment sample
CONTROL_BAM="path/to/control.bam"   # Aligned BAM file for control sample (e.g., IgG or input)
GENOME_FASTA="path/to/hg38.fa"       # Reference genome FASTA file (e.g., from UCSC or Ensembl)
OUTPUT_DIR="cutruntools_analysis_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Execute CUT&RUNTools 2.0
# Note: The description mentions a "modified version". This command represents the standard usage of CUT&RUNTools 2.0.
# Specific modifications would need to be applied to the script itself or reflected in additional parameters if applicable.
python CUT-RUNTools.py \
    -t "${TREATMENT_BAM}" \
    -c "${CONTROL_BAM}" \
    -g "${GENOME_FASTA}" \
    -o "${OUTPUT_DIR}" \
    # Add other parameters as needed, e.g., for species, peak caller, fragment size, etc.
    # -s hg38 \ # Specify species (e.g., hg38, mm10)
    # -p macs2 \ # Specify peak caller (e.g., macs2, seacr)
    # -f 120   # Specify fragment size (if known, otherwise auto-detected)

Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through.

Trimmomatic v0.36

$ Bash example

# Install Trimmomatic (if not already installed)
# conda install -c bioconda trimmomatic=0.36

# Define input and output file names
# Assuming paired-end reads based on "fragment read-through"
READ1_IN="input_R1.fastq.gz"
READ2_IN="input_R2.fastq.gz"
READ1_PAIRED_OUT="output_R1_paired.fastq.gz"
READ1_UNPAIRED_OUT="output_R1_unpaired.fastq.gz"
READ2_PAIRED_OUT="output_R2_paired.fastq.gz"
READ2_UNPAIRED_OUT="output_R2_unpaired.fastq.gz"

# Define adapter file path (replace with actual path to your adapter file)
# Trimmomatic comes with adapter files in its 'adapters' directory, e.g., TruSeq3-PE.fa
# The description implies a thorough adapter trimming, including read-through sequences.
# This is handled by the ILLUMINACLIP step.
ADAPTER_FILE="/path/to/Trimmomatic-0.36/adapters/TruSeq3-PE.fa"

# Execute Trimmomatic for adapter trimming and basic quality filtering.
# The ILLUMINACLIP step handles both initial adapter removal and the removal of
# adapter overhangs due to fragment read-through (especially with the palindrome threshold).
# Common parameters for ILLUMINACLIP are 2:30:10 (seed mismatches:palindrome_clip_threshold:simple_clip_threshold).
# Other common quality trimming steps (LEADING, TRAILING, SLIDINGWINDOW, MINLEN) are included as typical usage.
java -jar /path/to/trimmomatic-0.36.jar PE \
    "${READ1_IN}" "${READ2_IN}" \
    "${READ1_PAIRED_OUT}" "${READ1_UNPAIRED_OUT}" \
    "${READ2_PAIRED_OUT}" "${READ2_UNPAIRED_OUT}" \
    ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \
    LEADING:3 \
    TRAILING:3 \
    SLIDINGWINDOW:4:15 \
    MINLEN:36

Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset â--very-sensitive-local,â minimum fragment length 10, and maximum fragment length 700.

Bowtie2 v2.3.5 GitHub

$ Bash example

# Install bowtie2 (if not already installed)
# conda install -c bioconda bowtie2

# Install samtools (if not already installed, for converting SAM to BAM)
# conda install -c bioconda samtools

# Define reference genome index path
# Replace with the actual path to your hg38 bowtie2 index files
HG38_INDEX="path/to/hg38_bowtie2_index/hg38"

# Define input FASTQ files (assuming paired-end reads based on fragment length parameters)
# Replace with your actual input read files
READS_R1="input_R1.fastq.gz"
READS_R2="input_R2.fastq.gz"

# Define output BAM file
OUTPUT_BAM="aligned.bam"

# Align reads to hg38 using bowtie2
# --very-sensitive-local: preset for sensitive local alignment
# -I 10: minimum fragment length 10
# -X 700: maximum fragment length 700
# -x ${HG38_INDEX}: path to the reference genome index
# -1 ${READS_R1}: first mate reads
# -2 ${READS_R2}: second mate reads
# -S - : output SAM to stdout
# | samtools view -bS - : pipe SAM from stdout to samtools to convert to BAM
# > ${OUTPUT_BAM}: redirect samtools output to the BAM file
bowtie2 --very-sensitive-local -I 10 -X 700 -x "${HG38_INDEX}" -1 "${READS_R1}" -2 "${READS_R2}" -S - | samtools view -bS - > "${OUTPUT_BAM}"

View on GitHub

After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control.

MACS2 v2.2.7

$ Bash example

# Install MACS2 (if not already installed)
# conda install -c bioconda macs2

# Define input files and output prefix
TREATMENT_BAM="treatment.bam" # Replace with your actual treatment BAM file (after PCR duplicate removal)
CONTROL_BAM="control.bam"     # Replace with your actual control BAM file (V5 mock IP, after PCR duplicate removal)
OUTPUT_PREFIX="my_experiment_macs2_peaks"
OUTPUT_DIR="macs2_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run MACS2 callpeak with default narrowPeak settings
# -t: Treatment file (IP sample)
# -c: Control file (V5 mock IP sample)
# -f BAM: Input file format (BAM for aligned reads). Assumes single-end reads; use BAMPE for paired-end.
# -g hs: Effective genome size (e.g., 'hs' for human, 'mm' for mouse). Adjust if needed for your organism.
# -n: Experiment name, used as prefix for output files
# --outdir: Specify output directory
macs2 callpeak \
  -t "${TREATMENT_BAM}" \
  -c "${CONTROL_BAM}" \
  -f BAM \
  -g hs \
  -n "${OUTPUT_PREFIX}" \
  --outdir "${OUTPUT_DIR}"

Tools Used

ChIP-seq Bowtie2

Raw Source Text

*library strategy: Cut&Run
Data analysis was performed using a modified version of CUT&RUNTools 2.0. Adapters were trimmed using Trimmomatic (v0.36), followed by a second round of trimming to remove any remaining adapter overhang sequences not removed due to fragment read-through. Reads were aligned to hg38 using bowtie2 (v2.3.5.1) using preset â--very-sensitive-local,â minimum fragment length 10, and maximum fragment length 700. After removing PCR duplicates with Picard (v0.1.8), peaks were called using MACS2 (v2.2.7.1) on the default narrowPeak setting using the same-batch V5 mock IP sample as a normalization control.
Assembly: hg38
Supplementary files format and content: narrowPeak files (q<0.05) output by MACS2

← Back to Analysis