GSE77700 Processing Pipeline

RIP-Seq code_examples 3 steps

Publication

Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.

Nature communications (2016) — PMID 27378374

Dataset

GSE77700

Distinct and shared molecular targets and functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analy…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata

    Bowtie v0.12.2 GitHub
    $ Bash example
    # Install Bowtie (version 0.12.2 is quite old, may require specific channels or manual compilation)
    # conda install -c bioconda bowtie=0.12.2
    
    # Define the genome assembly to use (mm9 or hg18)
    # Replace with your actual indexed genome path
    GENOME_ASSEMBLY="mm9" # or "hg18"
    BOWTIE_INDEX_BASE="/path/to/bowtie_indexes/${GENOME_ASSEMBLY}"
    
    # Define input reads file (after trimming and masking)
    INPUT_READS="trimmed_and_masked_reads.fastq"
    
    # Define output SAM file
    OUTPUT_SAM="${GENOME_ASSEMBLY}_alignments.sam"
    
    # Run Bowtie with specified parameters
    bowtie -q -p 4 -e 100 -y -a -m 10 --best --strata "${BOWTIE_INDEX_BASE}" "${INPUT_READS}" > "${OUTPUT_SAM}"
  2. 2

    wig files are strand specific read densities generated using custom scripts from duplicate removed bam files.

    bedtools (Inferred with models/gemini-2.5-flash) v2.29.2 GitHub
    $ Bash example
    # conda install -c bioconda bedtools
    
    # Define input and output paths
    INPUT_BAM="input.dedup.bam" # Path to the duplicate-removed BAM file
    OUTPUT_PREFIX="output"      # Prefix for output wiggle files
    
    # Define reference genome files
    # Replace 'hg38.fa' with the actual path to your reference genome FASTA file.
    # Replace 'hg38.chrom.sizes' with the actual path to your chromosome sizes file.
    # If chrom.sizes is not available, it can be generated from the FASTA index:
    # samtools faidx hg38.fa
    # cut -f1,2 hg38.fa.fai > hg38.chrom.sizes
    GENOME_FASTA="hg38.fa" # Placeholder for reference genome FASTA
    CHROM_SIZES="hg38.chrom.sizes" # Placeholder for chromosome sizes file
    
    # --- Placeholder for generating chrom.sizes if needed ---
    # # Ensure samtools is installed: conda install -c bioconda samtools
    # if [ ! -f "${CHROM_SIZES}" ]; then
    #     echo "Generating ${CHROM_SIZES} from ${GENOME_FASTA}..."
    #     samtools faidx "${GENOME_FASTA}"
    #     cut -f1,2 "${GENOME_FASTA}".fai > "${CHROM_SIZES}"
    # fi
    # --------------------------------------------------------
    
    # Generate plus strand specific read densities in WIG format
    # The 'track' line is added to make it a valid WIG file.
    # bedtools genomecov -bg outputs bedGraph format (chr start end score),
    # which is essentially the data part of a variableStep WIG file.
    echo 'track type=wiggle_0 name="plus_strand_coverage" description="Plus strand read densities" visibility=full autoScale=on color=0,0,255' > "${OUTPUT_PREFIX}_plus.wig"
    bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand + | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_plus.wig"
    
    # Generate minus strand specific read densities in WIG format
    echo 'track type=wiggle_0 name="minus_strand_coverage" description="Minus strand read densities" visibility=full autoScale=on color=255,0,0' > "${OUTPUT_PREFIX}_minus.wig"
    bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand - | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_minus.wig"
  3. 3

    bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm.

    CLIP-seq vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install clipper (if not already installed)
    # It's a Python script, typically run by cloning the repository or installing via pip if available.
    # For example, if cloning:
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    
    # Placeholder variables - replace with actual file paths and species
    # INPUT_BAM: Path to the CLIP-seq alignment file (BAM format)
    # CONTROL_BAM: Path to the control alignment file (e.g., input, IgG, or size-matched input) (BAM format)
    # SPECIES: Genome assembly identifier (e.g., hg38, mm10)
    # OUTPUT_PREFIX: Prefix for output files (e.g., peak_calls)
    
    INPUT_BAM="path/to/your/clip_seq_sample.bam"
    CONTROL_BAM="path/to/your/control_sample.bam"
    SPECIES="hg38" # Using hg38 as a common placeholder
    OUTPUT_PREFIX="clip_seq_peaks"
    
    # Execute clipper peak calling
    # Assuming clipper.py is in the current directory or in your PATH
    python clipper.py \
        -s "${SPECIES}" \
        -o "${OUTPUT_PREFIX}" \
        "${INPUT_BAM}" \
        "${CONTROL_BAM}"
    
    # The output will typically be a BED file named ${OUTPUT_PREFIX}.bed

Tools Used

Raw Source Text
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata
wig files are strand specific read densities generated using custom scripts from duplicate removed bam files.
bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm.
Genome_build: mm9
Supplementary_files_format_and_content: bed files of peaks called and wiggle files of read densities across the genome
← Back to Analysis