GSE181136 Processing Pipeline
Publication
Identification of the global miR-130a targetome reveals a role for TBL1XR1 in hematopoietic stem cell self-renewal and t(8;21) AML.Cell reports (2022) — PMID 35263585
Dataset
GSE181136Global Identification of the miR-130a targetome in human HSPC Reveals a Novel Role for TBL1XR1 in Hematopoietic Stem Cell Self-Renewal and t(8;21) A…
Processing Steps
Generate Jupyter Notebook-
1
Library strategy: CUT&RUN
$ Bash example
# Install Nextflow if not already installed # curl -s https://get.nextflow.io | bash # mv nextflow /usr/local/bin/ # Create a dummy directory for input FASTQ files # mkdir -p data # Create dummy FASTQ files for demonstration purposes. # In a real scenario, replace these with your actual gzipped FASTQ files. # touch data/H3K27me3_rep1_R1.fastq.gz data/H3K27me3_rep1_R2.fastq.gz # touch data/IgG_rep1_R1.fastq.gz data/IgG_rep1_R2.fastq.gz # Create a samples.tsv file to define input samples for the ENCODE ChIP-seq pipeline. # For CUT&RUN, this typically includes target samples (e.g., H3K27me3) and an IgG control. # The 'control_id' column links target samples to their corresponding control. cat << EOF > samples.tsv group replicate fastq_rep1_R1 fastq_rep1_R2 fastq_rep2_R1 fastq_rep2_R2 antibody control_id H3K27me3_sample 1 data/H3K27me3_rep1_R1.fastq.gz data/H3K27me3_rep1_R2.fastq.gz H3K27me3 IgG_control IgG_control 1 data/IgG_rep1_R1.fastq.gz data/IgG_rep1_R2.fastq.gz IgG EOF # Run the ENCODE ChIP-seq pipeline using Nextflow. # -profile docker: Uses Docker for reproducible execution. Ensure Docker is installed and running. # --input samples.tsv: Specifies the input sample sheet created above. # --genome GRCh38: Specifies the reference genome (e.g., human hg38/GRCh38). # --paired_end true: Indicates that the input reads are paired-end. # --outdir results: Specifies the directory where pipeline outputs will be stored. nextflow run ENCODE-DCC/chip-seq-pipeline2 \ -profile docker \ --input samples.tsv \ --genome GRCh38 \ --paired_end true \ --outdir results -
2
CUT&RUN paired-end data was trimmed using fastp v.
$ Bash example
# Install fastp (if not already installed) # conda install -c bioconda fastp # Define input and output file names (placeholders) INPUT_R1="sample_R1.fastq.gz" INPUT_R2="sample_R2.fastq.gz" OUTPUT_R1="sample_R1.trimmed.fastq.gz" OUTPUT_R2="sample_R2.trimmed.fastq.gz" REPORT_JSON="sample.fastp.json" REPORT_HTML="sample.fastp.html" # Execute fastp for paired-end trimming fastp \ -i "${INPUT_R1}" \ -I "${INPUT_R2}" \ -o "${OUTPUT_R1}" \ -O "${OUTPUT_R2}" \ -j "${REPORT_JSON}" \ -h "${REPORT_HTML}" -
3
0.19.5 to remove adapters and low quality base pairs with base pair quality score <30 and read length <35 bp.
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt=2.10 # Define input and output files INPUT_R1="input_R1.fastq.gz" INPUT_R2="input_R2.fastq.gz" OUTPUT_R1="trimmed_R1.fastq.gz" OUTPUT_R2="trimmed_R2.fastq.gz" # Define adapter sequence(s). Replace with the actual adapter sequence used in your library preparation. # For Illumina TruSeq, a common 3' adapter for Read 1 is: # ADAPTER_3PRIME_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # For eCLIP, a specific linker adapter might be used, e.g., from the Yeo lab eCLIP protocol. # Example placeholder adapter: ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Execute cutadapt for adapter trimming and quality/length filtering # -a ADAPTER_SEQUENCE: 3' adapter for Read 1 # -A ADAPTER_SEQUENCE: 3' adapter for Read 2 (often the same or reverse complement depending on library) # -q 30,30: Trim low-quality bases from both ends with quality score < 30 # -m 35: Discard reads shorter than 35 bp after trimming # --pair-filter=any: Discard a read pair if either read becomes too short after trimming # -o: Output file for Read 1 # -p: Output file for Read 2 cutadapt -a "${ADAPTER_SEQUENCE}" -A "${ADAPTER_SEQUENCE}" \ -q 30,30 \ -m 35 \ --pair-filter=any \ -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \ "${INPUT_R1}" "${INPUT_R2}" -
4
Trimmed reads were aligned to the hg38 human reference genome using bowtie2 v.
$ Bash example
# Installation # conda install -c bioconda bowtie2 samtools # Reference genome preparation (example, replace with actual hg38 fasta path) # mkdir -p /path/to/hg38_bowtie2_index # wget -O /path/to/hg38_bowtie2_index/hg38.fa.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz # gunzip /path/to/hg38_bowtie2_index/hg38.fa.gz # bowtie2-build /path/to/hg38_bowtie2_index/hg38.fa /path/to/hg38_bowtie2_index/hg38 # Alignment # Assume trimmed_reads.fastq.gz is the input file (single-end example) # Assume /path/to/hg38_bowtie2_index/hg38 is the prefix for the bowtie2 index files # Assume output will be aligned_reads.bam bowtie2 --local --very-sensitive-local \ -x /path/to/hg38_bowtie2_index/hg38 \ -U trimmed_reads.fastq.gz \ -p 8 \ 2> bowtie2_alignment.log \ | samtools view -bS - \ | samtools sort -o aligned_reads.bam - samtools index aligned_reads.bam -
5
2.3.5.
Unknown (Inferred with models/gemini-2.5-flash) vUnknown (Inferred with models/gemini-2.5-flash)$ Bash example
# No specific tool or command can be inferred from the step description "2.3.5.". # Please provide a more detailed description of the step (e.g., "Peak calling for eCLIP"). # A common reference genome placeholder might be: # REF_GENOME="/path/to/reference/genome/hg38.fa"
-
6
Reads were also aligned to the S. cerevisiae yeast genome (sacCer3) yeast genome to evaluate the yeast spike-in control for the normalization.
$ Bash example
# Install STAR if not already installed # conda install -c bioconda star # Define variables # Replace with your actual fastq file(s). For single-end reads, remove READS_R2. READS_R1="path/to/your/spike_in_reads_R1.fastq.gz" READS_R2="path/to/your/spike_in_reads_R2.fastq.gz" # Path to your pre-built STAR genome index for S. cerevisiae (sacCer3) # This index needs to be generated once using the sacCer3 fasta and GTF files. # Example for building index (run once): # STAR --runMode genomeGenerate \ # --genomeDir sacCer3_star_index \ # --genomeFastaFiles sacCer3.fasta \ # --sjdbGTFfile sacCer3.gtf \ # --runThreadN 8 # Adjust threads as needed GENOME_DIR="sacCer3_star_index" OUTPUT_PREFIX="aligned_sacCer3_spikein" # Prefix for output files # Run STAR alignment for spike-in reads # Adjust --readFilesIn for single-end reads (e.g., --readFilesIn ${READS_R1}) STAR --genomeDir ${GENOME_DIR} \ --readFilesIn ${READS_R1} ${READS_R2} \ --runThreadN 8 \ --outFileNamePrefix ${OUTPUT_PREFIX}. \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB) # Rename the output BAM file for clarity mv ${OUTPUT_PREFIX}.Aligned.sortedByCoord.out.bam ${OUTPUT_PREFIX}.bam -
7
Peaks were called from the pooled replicates against the IgG using MACS2 v.
$ Bash example
# Install MACS2 (if not already installed) # conda install -c bioconda macs2=2.2.7.1 # Run MACS2 peak calling # Assuming 'treatment.pooled.bam' is the pooled replicates BAM and 'control.pooled.bam' is the pooled IgG control BAM. # '-g hs' specifies the genome size for human (hg38 equivalent). Adjust if a different organism is used. # '-f BAM' specifies the input file format as BAM. Use 'BAMPE' for paired-end data. # '-q 0.01' sets the minimum FDR (q-value) cutoff for peak detection. # '--outdir' specifies the output directory. # '-n' specifies the prefix for output files. macs2 callpeak \ -t treatment.pooled.bam \ -c control.pooled.bam \ -f BAM \ -g hs \ -q 0.01 \ --outdir ./ \ -n macs2_peaks
-
8
2.2.5 with normalizing the samples based on the spike-in reads qunatification.
DESeq2 (Inferred with models/gemini-2.5-flash) v1.38.0 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install R and DESeq2 if not already present # conda install -c bioconda bioconductor-deseq2 r-base # Example: Create dummy spike_in_counts.tsv and raw_counts.tsv for demonstration # In a real pipeline, these files would be generated by upstream quantification steps. # spike_in_counts.tsv: Contains spike-in read counts per sample # raw_counts.tsv: Contains raw feature (e.g., gene) counts per sample echo -e "sample_id\tspike_in_reads" > spike_in_counts.tsv echo -e "sample1\t1000" >> spike_in_counts.tsv echo -e "sample2\t1500" >> spike_in_counts.tsv echo -e "sample3\t800" >> spike_in_counts.tsv echo -e "gene_id\tsample1\tsample2\tsample3" > raw_counts.tsv echo -e "geneA\t100\t150\t80" >> raw_counts.tsv echo -e "geneB\t200\t300\t160" >> raw_counts.tsv echo -e "geneC\t50\t75\t40" >> raw_counts.tsv # R script to perform spike-in normalization using DESeq2 Rscript -e ' library(DESeq2) # Load spike-in counts spike_in_data <- read.delim("spike_in_counts.tsv", row.names = 1) spike_in_counts <- as.numeric(spike_in_data$spike_in_reads) names(spike_in_counts) <- rownames(spike_in_data) # Calculate custom size factors based on spike-in reads # A common method is to normalize to the median spike-in count across samples. # Each sample''s counts will be scaled by (median_spike_in / sample_spike_in_count). median_spike_in <- median(spike_in_counts) custom_size_factors <- median_spike_in / spike_in_counts # Load raw feature counts (e.g., gene counts) raw_counts_data <- read.delim("raw_counts.tsv", row.names = 1) # Ensure sample order in custom_size_factors matches the column order in raw_counts_data sample_names <- colnames(raw_counts_data) custom_size_factors <- custom_size_factors[sample_names] # Create colData for DESeqDataSet (minimal, as we are only normalizing) colData <- data.frame(row.names = sample_names, condition = rep("control", length(sample_names))) # Create DESeqDataSet object dds <- DESeqDataSetFromMatrix(countData = raw_counts_data, colData = colData, design = ~ 1) # Simple design as we are only normalizing # Assign custom size factors derived from spike-in reads sizeFactors(dds) <- custom_size_factors # Run DESeq to apply normalization. # estimateDispersions is needed for the counts(dds, normalized=TRUE) function to work correctly, # even if not performing full differential expression analysis. dds <- estimateDispersions(dds) # Get normalized counts normalized_counts <- counts(dds, normalized = TRUE) # Write normalized counts to an output file write.table(as.data.frame(normalized_counts), "normalized_counts_spikein.tsv", sep = "\t", quote = FALSE, col.names = NA) ' -
9
ChipPeakAnnot library in the R package v.3.6.1 was used to determine the overlap between peak sets and determine the shared and specific peaks represented in the Venn diagrams.
$ Bash example
#!/bin/bash # This script uses the ChIPpeakAnno R package to determine overlap between peak sets # and generate Venn diagrams, as described in the context. # Ensure R is installed and available in the PATH. The context specifies R v.3.6.1. # You might need to use a specific R environment or installation for this version. # Example for installing R (adjust for your OS/package manager and desired version): # sudo apt-get update # sudo apt-get install r-base=3.6.1-3bionic # Example for Ubuntu 18.04 # Install BiocManager if not already installed # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager', repos='https://cloud.r-project.org')" # Install ChIPpeakAnno package from Bioconductor # R -e "BiocManager::install('ChIPpeakAnno')" # Define input peak files (replace with actual paths to your BED files). # These are placeholders. In a real pipeline, these would be outputs from a peak caller. # ChIPpeakAnno's makeVennDiagram function supports up to 5 peak sets. PEAK_FILE_1="sample1_peaks.bed" PEAK_FILE_2="sample2_peaks.bed" PEAK_FILE_3="sample3_peaks.bed" # Example for a 3-way Venn diagram # PEAK_FILE_4="sample4_peaks.bed" # PEAK_FILE_5="sample5_peaks.bed" # Define output PDF file for the Venn diagram OUTPUT_VENN_PDF="peak_overlap_venn_diagram.pdf" # Create an R script to perform the peak overlap and Venn diagram generation cat <<EOF > run_chippeakanno.R library(ChIPpeakAnno) library(GenomicRanges) # Often loaded implicitly or explicitly with ChIPpeakAnno # Function to load peaks from a BED file into a GRanges object # This function assumes a standard BED format (at least chr, start, end). # Adjust 'col.names' and other parameters if your BED file has a different structure. load_peaks <- function(file_path) { if (!file.exists(file_path)) { stop(paste("Peak file not found:", file_path)) } # Read BED file, skipping comment lines (starting with #) peaks_df <- read.delim(file_path, header=FALSE, comment.char="#", stringsAsFactors=FALSE) # Ensure at least 3 columns for seqnames (chromosome), start, end if (ncol(peaks_df) < 3) { stop(paste("Invalid BED file format for", file_path, ": expected at least 3 columns.")) } # Assign standard column names for GRanges conversion colnames(peaks_df)[1:3] <- c("seqnames", "start", "end") # Create GRanges object gr <- makeGRangesFromDataFrame(peaks_df, keep.extra.columns=TRUE, # Keep any additional columns (e.g., name, score) ignore.strand=TRUE, # Peaks are typically strand-agnostic for overlap analysis seqinfo=NULL) # Can provide seqinfo if known, but often not strictly necessary for overlap return(gr) } # Load peak sets into a list of GRanges objects peaklist <- list() # Check if peak files exist before attempting to load them if (file.exists("${PEAK_FILE_1}")) { peaklist[["Set1"]] <- load_peaks("${PEAK_FILE_1}") } else { message(paste("Warning: Peak file not found, skipping:", "${PEAK_FILE_1}")) } if (file.exists("${PEAK_FILE_2}")) { peaklist[["Set2"]] <- load_peaks("${PEAK_FILE_2}") } else { message(paste("Warning: Peak file not found, skipping:", "${PEAK_FILE_2}")) } if (file.exists("${PEAK_FILE_3}")) { peaklist[["Set3"]] <- load_peaks("${PEAK_FILE_3}") } else { message(paste("Warning: Peak file not found, skipping:", "${PEAK_FILE_3}")) } # Add more peak files here if needed (up to 5 for makeVennDiagram) # if (file.exists("${PEAK_FILE_4}")) { # peaklist[["Set4"]] <- load_peaks("${PEAK_FILE_4}") # } # if (file.exists("${PEAK_FILE_5}")) { # peaklist[["Set5"]] <- load_peaks("${PEAK_FILE_5}") # } # Ensure at least two peak files were successfully loaded for a Venn diagram if (length(peaklist) < 2) { stop("At least two peak files are required to generate a Venn diagram. Please check input file paths.") } # Determine overlap and generate Venn diagram # The 'makeVennDiagram' function visualizes shared and specific peaks. pdf("${OUTPUT_VENN_PDF}") venn_results <- makeVennDiagram( peaklist, NameOfPeaks = names(peaklist), # Labels for each set in the diagram main = "Overlap of Peak Sets", # Main title for the plot fill = c("red", "blue", "green", "yellow", "purple")[1:length(peaklist)], # Customize fill colors cat.col = c("red", "blue", "green", "yellow", "purple")[1:length(peaklist)], # Customize category label colors cex = 1.5, # Size of the numbers in the diagram cat.cex = 1.5, # Size of the category labels ind = TRUE # Set to TRUE to get indices of overlapping peaks in the results object ) dev.off() message(paste("Venn diagram showing peak overlaps saved to:", "${OUTPUT_VENN_PDF}")) # Optional: To extract specific overlapping/unique peak sets as GRanges objects. # This can be done using the 'venn_results' object returned by makeVennDiagram. # For example, to get peaks unique to Set1 (assuming 3 sets): # if (length(peaklist) == 3) { # unique_set1_indices <- venn_results$vennCounts[venn_results$vennCounts$Set1 == 1 & venn_results$vennCounts$Set2 == 0 & venn_results$vennCounts$Set3 == 0, "idx"] # if (length(unique_set1_indices) > 0) { # unique_peaks_set1 <- peaklist[["Set1"]][unique_set1_indices] # # write.table(as.data.frame(unique_peaks_set1), "unique_peaks_set1.bed", sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE) # message(paste("Number of peaks unique to Set1:", length(unique_peaks_set1))) # } # } EOF # Execute the R script using the specified R version (3.6.1) Rscript run_chippeakanno.R -
10
Minimum peak overlap was set to 100 bp to determine the intersection peaks between the two samples.
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools # Define input and output files # Replace 'sample1_peaks.bed' and 'sample2_peaks.bed' with your actual peak file paths. INPUT_PEAKS_SAMPLE1="sample1_peaks.bed" INPUT_PEAKS_SAMPLE2="sample2_peaks.bed" OUTPUT_INTERSECTION_PEAKS="intersection_peaks.bed" # Determine the intersection peaks between the two samples with a minimum overlap of 100 bp. # bedtools intersect outputs the overlapping segments between two BED files. # The awk command then filters these segments, keeping only those whose length (end - start) is 100 bp or greater. bedtools intersect -a "${INPUT_PEAKS_SAMPLE1}" -b "${INPUT_PEAKS_SAMPLE2}" | awk '($3-$2) >= 100' > "${OUTPUT_INTERSECTION_PEAKS}" # Note: The input peak files (e.g., sample1_peaks.bed, sample2_peaks.bed) are assumed to be in BED format # and based on a specific reference genome (e.g., hg38, mm10), though the genome assembly is not directly # used by bedtools intersect itself. -
11
Shared peaks between the sets ETO and Flag antibodies in the Control and miR-130a KD samples were used for the downstream analysis.
intersect_peaks.py (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Clone the yeolab/merge_peaks repository if not already available # git clone https://github.com/yeolab/merge_peaks.git # cd merge_peaks # Assuming peak files from previous steps are available in BED format. # Replace these placeholder paths/names with your actual peak files. ETO_Control_peaks="ETO_Control_reproducible_peaks.bed" Flag_Control_peaks="Flag_Control_reproducible_peaks.bed" ETO_KD_peaks="ETO_KD_reproducible_peaks.bed" Flag_KD_peaks="Flag_KD_reproducible_peaks.bed" # Output file for the shared peaks output_shared_peaks="shared_peaks_ETO_Flag_Control_KD.bed" # Execute the intersect_peaks.py script to find peaks common to all specified sets. # This command assumes the script is in the current working directory or in PATH. python intersect_peaks.py \ "${ETO_Control_peaks}" \ "${Flag_Control_peaks}" \ "${ETO_KD_peaks}" \ "${Flag_KD_peaks}" \ -o "${output_shared_peaks}" -
12
These two sets of shared peaks were intersected again using ChipPeakAnnot library to determine the shared peaks between the two sets and the unique peaks to each of the two sets.
$ Bash example
# The description refers to 'ChipPeakAnnot library' which appears to be a custom or less common Python library. # The functionality described (intersecting peak sets and identifying unique peaks) is commonly and robustly performed using bedtools. # The following commands demonstrate how to achieve the described operations using bedtools. # Install bedtools if not already available # conda install -c bioconda bedtools # Define input peak files (replace with actual filenames) INPUT_PEAKS_SET1="input_peaks_set1.bed" INPUT_PEAKS_SET2="input_peaks_set2.bed" # Define output files SHARED_PEAKS="shared_peaks.bed" UNIQUE_TO_SET1="unique_to_set1.bed" UNIQUE_TO_SET2="unique_to_set2.bed" # Determine shared peaks between the two sets # By default, bedtools intersect outputs the overlapping regions. bedtools intersect -a "${INPUT_PEAKS_SET1}" -b "${INPUT_PEAKS_SET2}" > "${SHARED_PEAKS}" # Determine unique peaks to Set 1 (peaks in Set 1 that do not overlap with Set 2) bedtools intersect -a "${INPUT_PEAKS_SET1}" -b "${INPUT_PEAKS_SET2}" -v > "${UNIQUE_TO_SET1}" # Determine unique peaks to Set 2 (peaks in Set 2 that do not overlap with Set 1) bedtools intersect -a "${INPUT_PEAKS_SET2}" -b "${INPUT_PEAKS_SET1}" -v > "${UNIQUE_TO_SET2}" -
13
Peaks were annotated to the genomic features of the hg38 human genomes.
HOMER (Hypergeometric Optimization of Motif EnRichment) - annotatePeaks.pl (Inferred with models/gemini-2.5-flash) vv4.11 GitHub$ Bash example
# Install HOMER (if not already installed) # conda install -c bioconda homer # Example: Download and install hg38 genome for HOMER (if not already done) # perl /path/to/homer/bin/configureHomer.pl -install hg38 # Define input peak file (replace with your actual peak file, e.g., a BED file) input_peaks="input_peaks.bed" # Define output annotation file output_annotation="annotated_peaks.txt" # Annotate peaks to hg38 genomic features # The 'hg38' argument tells HOMER to use its pre-indexed hg38 genome annotation. annotatePeaks.pl "${input_peaks}" hg38 > "${output_annotation}" -
14
HOMER v.
$ Bash example
# Install HOMER (example using conda) # conda install -c bioconda homer # Placeholder variables - replace with actual paths and values from your experiment # Input peak file (e.g., BED format) from a previous step (e.g., peak calling) INPUT_PEAKS="your_experiment_peaks.bed" # Reference genome assembly (e.g., hg38, mm10). # Ensure HOMER indexes for this genome are available (e.g., by running 'perl configureHomer.pl -install hg38'). GENOME_ASSEMBLY="hg38" # Output directory for HOMER results OUTPUT_DIR="homer_motif_results" # Motif lengths to search for (e.g., 8, 10, 12 base pairs) MOTIF_LENGTHS="8,10,12" # Example: Run HOMER's findMotifsGenome.pl for de novo motif discovery # This command searches for motifs in the provided peak regions against the specified genome. # Parameters like -size and -len are common. -size given is typically used for BED files with varying region sizes. # Adjust parameters (-mask, -mis, -h, -p, etc.) based on specific experimental needs. findMotifsGenome.pl "${INPUT_PEAKS}" "${GENOME_ASSEMBLY}" "${OUTPUT_DIR}" -size given -len "${MOTIF_LENGTHS}" -
15
4.8 was used for de novo motif analysis of the sets of peaks.
MEME-ChIP (Inferred with models/gemini-2.5-flash) v5.4.1 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install MEME Suite (e.g., via Bioconda) # conda install -c bioconda meme=5.4.1 # Placeholder for input peak file (e.g., a BED file containing peak regions) PEAKS_BED="your_input_peaks.bed" # Placeholder for reference genome FASTA file # This file is used by meme-chip to extract sequences corresponding to the peaks. REFERENCE_GENOME_FASTA="path/to/your/reference_genome.fa" # e.g., hg38.fa # Output directory for the motif analysis results OUTPUT_DIR="de_novo_motif_analysis_results" # Create the output directory if it does not exist mkdir -p "${OUTPUT_DIR}" # Run MEME-ChIP for de novo motif analysis # -oc: Output directory # -genome: Path to the reference genome FASTA file # -fimo-skip: Skip FIMO analysis (can be run separately if needed) # -meme-maxw: Maximum width of motifs to find by MEME # -meme-minw: Minimum width of motifs to find by MEME # -meme-nmotifs: Number of motifs to find by MEME # -dreme-e: E-value threshold for DREME # -dna: Specifies that the input sequences are DNA # The last argument is the input peak file (BED format is accepted with -genome) meme-chip -oc "${OUTPUT_DIR}" \ -genome "${REFERENCE_GENOME_FASTA}" \ -fimo-skip \ -meme-maxw 20 \ -meme-minw 6 \ -meme-nmotifs 10 \ -dreme-e 0.05 \ -dna "${PEAKS_BED}"
Raw Source Text
Library strategy: CUT&RUN CUT&RUN paired-end data was trimmed using fastp v. 0.19.5 to remove adapters and low quality base pairs with base pair quality score <30 and read length <35 bp. Trimmed reads were aligned to the hg38 human reference genome using bowtie2 v. 2.3.5. Reads were also aligned to the S. cerevisiae yeast genome (sacCer3) yeast genome to evaluate the yeast spike-in control for the normalization. Peaks were called from the pooled replicates against the IgG using MACS2 v. 2.2.5 with normalizing the samples based on the spike-in reads qunatification. ChipPeakAnnot library in the R package v.3.6.1 was used to determine the overlap between peak sets and determine the shared and specific peaks represented in the Venn diagrams. Minimum peak overlap was set to 100 bp to determine the intersection peaks between the two samples. Shared peaks between the sets ETO and Flag antibodies in the Control and miR-130a KD samples were used for the downstream analysis. These two sets of shared peaks were intersected again using ChipPeakAnnot library to determine the shared peaks between the two sets and the unique peaks to each of the two sets. Peaks were annotated to the genomic features of the hg38 human genomes. HOMER v. 4.8 was used for de novo motif analysis of the sets of peaks. Genome_build: hg38 Supplementary_files_format_and_content: Bed files