GSE90650 Processing Pipeline

ncRNA-Seq code_examples 6 steps

Publication

NEAT1 scaffolds RNA-binding proteins and the Microprocessor to globally enhance pri-miRNA processing.

Nature structural & molecular biology (2017) — PMID 28846091

Dataset

GSE90650

The LncRNA NEAT1 Nests RNA Binding Proteins and the Microprocessor to Globally Enhance Pri-miRNA Processing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3).

    cutadapt v1.8.3 GitHub
    $ Bash example
    # Install cutadapt (version 1.8.3 is quite old, consider a newer version if possible)
    # conda install -c bioconda cutadapt=1.8.3
    
    # Remove the 3'-adaptor sequence
    cutadapt -a CTCGTATGCCGTCTTCTGCTTG -o trimmed_reads.fastq.gz reads.fastq.gz
  2. 2

    The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'.

    Bowtie v1.1.1 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie
    
    # Ensure the hg19 index is built or available. 
    # If not, you would build it using bowtie-build:
    # bowtie-build <path_to_hg19.fa> hg19
    
    # Align reads to hg19
    bowtie -l 25 -n 2 -k 1 -m 10 -e 200 --best --strata hg19 input_reads.fastq > output_aligned_reads.sam
  3. 3

    For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3).

    cutadapt v1.8.3 GitHub
    $ Bash example
    # Install cutadapt if not already installed
    # conda install -c bioconda cutadapt=1.8.3
    
    # Example usage:
    # Assuming input.fastq.gz is the raw small RNA-seq reads
    # and output.fastq.gz will be the trimmed reads.
    cutadapt -g NNNTC \
             -a NNCTCGTATGCCGTCTTCTGCTTG \
             -o output.fastq.gz \
             input.fastq.gz
  4. 4

    The target insert sequences were required to be at least 16 nt.

    fastp (Inferred with models/gemini-2.5-flash) v0.20.0 GitHub
    $ Bash example
    # Install fastp if not already installed
    # conda install -c bioconda fastp
    
    # Filter sequences to be at least 16 nt long
    # Assuming single-end input for simplicity. For paired-end, use --in1, --out1, --in2, --out2.
    fastp --in1 input.fastq.gz --out1 output.fastq.gz --length_required 16
  5. 5

    The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'.

    Bowtie v1.1.1 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie=1.1.1
    
    # Placeholder for Bowtie index for hg19. This index needs to be built prior to alignment.
    # Example: bowtie-build <hg19.fa> hg19_index
    hg19_index="/path/to/bowtie_indexes/hg19"
    
    # Placeholder for input filtered reads file
    filtered_reads="filtered_reads.fastq"
    
    # Placeholder for output SAM file
    output_sam="mapped_reads.sam"
    
    bowtie -n 0 -e 80 -l 18 -a -m 5 --best --strata "${hg19_index}" "${filtered_reads}" > "${output_sam}"
  6. 6

    The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19).

    samtools v0.1.19 GitHub
    $ Bash example
    # Install bedtools (version 2.24) and samtools (version 0.1.19) if not already installed
    # conda install -c bioconda bedtools=2.24 samtools=0.1.19
    
    # Install UCSC tools (bedGraphToBigWig) if not already installed
    # wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig -O bedGraphToBigWig
    # chmod +x bedGraphToBigWig
    
    # Define input and output files
    INPUT_BAM="input.sorted.bam" # Placeholder: Assuming input is a sorted and indexed BAM file
    OUTPUT_BEDGRAPH="output.bedgraph"
    OUTPUT_BIGWIG="output.bigWig"
    CHROM_SIZES="hg38.chrom.sizes" # Placeholder: Path to chromosome sizes file (e.g., from UCSC goldenPath)
    
    # Example of how to get chrom.sizes for hg38 (if not already available)
    # fetchChromSizes hg38 > hg38.chrom.sizes
    
    # Generate bedGraph from BAM using genomeCoverageBed
    # samtools 0.1.19 might have been used for sorting/indexing the BAM prior to this step.
    genomeCoverageBed -ibam "$INPUT_BAM" -bg -g "$CHROM_SIZES" > "$OUTPUT_BEDGRAPH"
    
    # Convert bedGraph to bigWig
    # bedGraphToBigWig is a UCSC tool, commonly used for this conversion.
    # Its use is inferred as bigWig format is the specified output.
    bedGraphToBigWig "$OUTPUT_BEDGRAPH" "$CHROM_SIZES" "$OUTPUT_BIGWIG"
Raw Source Text
For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3). The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'.
For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3). The target insert sequences were required to be at least 16 nt. The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'.
The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19).
Genome_build: hg19
Supplementary_files_format_and_content: bigWig for coverage in both strands
← Back to Analysis