GSE77700 Processing Pipeline
RIP-Seq
code_examples
3 steps
Publication
Distinct and shared functions of ALS-associated proteins TDP-43, FUS and TAF15 revealed by multisystem analyses.Nature communications (2016) — PMID 27378374
Dataset
GSE77700Distinct and shared molecular targets and functions of ALS-associated TDP-43, FUS, and TAF15 revealed by comprehensive multi-system integrative analy…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata
$ Bash example
# Install Bowtie (version 0.12.2 is quite old, may require specific channels or manual compilation) # conda install -c bioconda bowtie=0.12.2 # Define the genome assembly to use (mm9 or hg18) # Replace with your actual indexed genome path GENOME_ASSEMBLY="mm9" # or "hg18" BOWTIE_INDEX_BASE="/path/to/bowtie_indexes/${GENOME_ASSEMBLY}" # Define input reads file (after trimming and masking) INPUT_READS="trimmed_and_masked_reads.fastq" # Define output SAM file OUTPUT_SAM="${GENOME_ASSEMBLY}_alignments.sam" # Run Bowtie with specified parameters bowtie -q -p 4 -e 100 -y -a -m 10 --best --strata "${BOWTIE_INDEX_BASE}" "${INPUT_READS}" > "${OUTPUT_SAM}" -
2
wig files are strand specific read densities generated using custom scripts from duplicate removed bam files.
$ Bash example
# conda install -c bioconda bedtools # Define input and output paths INPUT_BAM="input.dedup.bam" # Path to the duplicate-removed BAM file OUTPUT_PREFIX="output" # Prefix for output wiggle files # Define reference genome files # Replace 'hg38.fa' with the actual path to your reference genome FASTA file. # Replace 'hg38.chrom.sizes' with the actual path to your chromosome sizes file. # If chrom.sizes is not available, it can be generated from the FASTA index: # samtools faidx hg38.fa # cut -f1,2 hg38.fa.fai > hg38.chrom.sizes GENOME_FASTA="hg38.fa" # Placeholder for reference genome FASTA CHROM_SIZES="hg38.chrom.sizes" # Placeholder for chromosome sizes file # --- Placeholder for generating chrom.sizes if needed --- # # Ensure samtools is installed: conda install -c bioconda samtools # if [ ! -f "${CHROM_SIZES}" ]; then # echo "Generating ${CHROM_SIZES} from ${GENOME_FASTA}..." # samtools faidx "${GENOME_FASTA}" # cut -f1,2 "${GENOME_FASTA}".fai > "${CHROM_SIZES}" # fi # -------------------------------------------------------- # Generate plus strand specific read densities in WIG format # The 'track' line is added to make it a valid WIG file. # bedtools genomecov -bg outputs bedGraph format (chr start end score), # which is essentially the data part of a variableStep WIG file. echo 'track type=wiggle_0 name="plus_strand_coverage" description="Plus strand read densities" visibility=full autoScale=on color=0,0,255' > "${OUTPUT_PREFIX}_plus.wig" bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand + | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_plus.wig" # Generate minus strand specific read densities in WIG format echo 'track type=wiggle_0 name="minus_strand_coverage" description="Minus strand read densities" visibility=full autoScale=on color=255,0,0' > "${OUTPUT_PREFIX}_minus.wig" bedtools genomecov -ibam "${INPUT_BAM}" -bg -strand - | sort -k1,1 -k2,2n >> "${OUTPUT_PREFIX}_minus.wig" -
3
bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm.
$ Bash example
# Install clipper (if not already installed) # It's a Python script, typically run by cloning the repository or installing via pip if available. # For example, if cloning: # git clone https://github.com/yeolab/clipper.git # cd clipper # Placeholder variables - replace with actual file paths and species # INPUT_BAM: Path to the CLIP-seq alignment file (BAM format) # CONTROL_BAM: Path to the control alignment file (e.g., input, IgG, or size-matched input) (BAM format) # SPECIES: Genome assembly identifier (e.g., hg38, mm10) # OUTPUT_PREFIX: Prefix for output files (e.g., peak_calls) INPUT_BAM="path/to/your/clip_seq_sample.bam" CONTROL_BAM="path/to/your/control_sample.bam" SPECIES="hg38" # Using hg38 as a common placeholder OUTPUT_PREFIX="clip_seq_peaks" # Execute clipper peak calling # Assuming clipper.py is in the current directory or in your PATH python clipper.py \ -s "${SPECIES}" \ -o "${OUTPUT_PREFIX}" \ "${INPUT_BAM}" \ "${CONTROL_BAM}" # The output will typically be a BED file named ${OUTPUT_PREFIX}.bed
Tools Used
Raw Source Text
Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to mm9 or hg18 whole genome using bowtie v0.12.2 with parameters -q -p 4 -e 100 -y -a -m 10 --best --strata wig files are strand specific read densities generated using custom scripts from duplicate removed bam files. bed files represent CLIP-seq peaks and were generated using an in-house peak finding algorithm. Genome_build: mm9 Supplementary_files_format_and_content: bed files of peaks called and wiggle files of read densities across the genome