GSE77700 Processing Pipeline

RIP-Seq code_examples 3 steps

Publication

Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.

Nature communications (2016) — PMID 27378374

Dataset

Distinct and shared molecular targets and functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analy…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata

Bowtie v0.12.2 GitHub

$ Bash example

# Install Bowtie (version 0.12.2 is quite old, may require specific channels or manual compilation)
# conda install -c bioconda bowtie=0.12.2

# Define the genome assembly to use (mm9 or hg18)
# Replace with your actual indexed genome path
GENOME_ASSEMBLY="mm9" # or "hg18"
BOWTIE_INDEX_BASE="/path/to/bowtie_indexes/${GENOME_ASSEMBLY}"

# Define input reads file (after trimming and masking)
INPUT_READS="trimmed_and_masked_reads.fastq"

# Define output SAM file
OUTPUT_SAM="${GENOME_ASSEMBLY}_alignments.sam"

# Run Bowtie with specified parameters
bowtie -q -p 4 -e 100 -y -a -m 10 --best --strata "${BOWTIE_INDEX_BASE}" "${INPUT_READS}" > "${OUTPUT_SAM}"

View on GitHub

wig files are strand specific read densities generated using custom scripts from duplicate removed bam files.

bedtools (Inferred with models/gemini-2.5-flash) v2.29.2 GitHub

$ Bash example

# conda install -c bioconda bedtools

# Define input and output paths
INPUT_BAM="input.dedup.bam" # Path to the duplicate-removed BAM file
OUTPUT_PREFIX="output"      # Prefix for output wiggle files

# Define reference genome files
# Replace 'hg38.fa' with the actual path to your reference genome FASTA file.
# Replace 'hg38.chrom.sizes' with the actual path to your chromosome sizes file.
# If chrom.sizes is not available, it can be generated from the FASTA index:
# samtools faidx hg38.fa
# cut -f1,2 hg38.fa.fai > hg38.chrom.sizes
GENOME_FASTA="hg38.fa" # Placeholder for reference genome FASTA
CHROM_SIZES="hg38.chrom.sizes" # Placeholder for chromosome sizes file

# --- Placeholder for generating chrom.sizes if needed ---
# # Ensure samtools is installed: conda install -c bioconda samtools
# if [ ! -f "${CHROM_SIZES}" ]; then
#     echo "Generating ${CHROM_SIZES} from ${GENOME_FASTA}..."
#     samtools faidx "${GENOME_FASTA}"
#     cut -f1,2 "${GENOME_FASTA}".fai > "${CHROM_SIZES}"
# fi
# --------------------------------------------------------

# Generate plus strand specific read densities in WIG format
# The 'track' line is added to make it a valid WIG file.
# bedtools genomecov -bg outputs bedGraph format (chr start end score),
# which is essentially the data part of a variableStep WIG file.
echo 'track type=wiggle_0 name="plus_strand_coverage" description="Plus strand read densities" visibility=full autoScale=on color=0,0,255' > "${OUTPUT_PREFIX}_plus.wig"
bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand + | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_plus.wig"

# Generate minus strand specific read densities in WIG format
echo 'track type=wiggle_0 name="minus_strand_coverage" description="Minus strand read densities" visibility=full autoScale=on color=255,0,0' > "${OUTPUT_PREFIX}_minus.wig"
bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand - | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_minus.wig"

View on GitHub

bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm.

CLIP-seq vNot specified (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install clipper (if not already installed)
# It's a Python script, typically run by cloning the repository or installing via pip if available.
# For example, if cloning:
# git clone https://github.com/yeolab/clipper.git
# cd clipper

# Placeholder variables - replace with actual file paths and species
# INPUT_BAM: Path to the CLIP-seq alignment file (BAM format)
# CONTROL_BAM: Path to the control alignment file (e.g., input, IgG, or size-matched input) (BAM format)
# SPECIES: Genome assembly identifier (e.g., hg38, mm10)
# OUTPUT_PREFIX: Prefix for output files (e.g., peak_calls)

INPUT_BAM="path/to/your/clip_seq_sample.bam"
CONTROL_BAM="path/to/your/control_sample.bam"
SPECIES="hg38" # Using hg38 as a common placeholder
OUTPUT_PREFIX="clip_seq_peaks"

# Execute clipper peak calling
# Assuming clipper.py is in the current directory or in your PATH
python clipper.py \
    -s "${SPECIES}" \
    -o "${OUTPUT_PREFIX}" \
    "${INPUT_BAM}" \
    "${CONTROL_BAM}"

# The output will typically be a BED file named ${OUTPUT_PREFIX}.bed

View on GitHub

Tools Used

CLIP-seq

Raw Source Text

Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata
wig files are strand specific read densities generated using custom scripts from duplicate removed bam files.
bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm.
Genome_build: mm9
Supplementary_files_format_and_content: bed files of peaks called and wiggle files of read densities across the genome

← Back to Analysis