GSE90650 Processing Pipeline

ncRNA-Seq code_examples 6 steps

Publication

NEAT1 scaffolds RNA-binding proteins and the Microprocessor to globally enhance pri-miRNA processing.

Nature structural & molecular biology (2017) — PMID 28846091

Dataset

The LncRNA NEAT1 Nests RNA Binding Proteins and the Microprocessor to Globally Enhance Pri-miRNA Processing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3).

cutadapt v1.8.3 GitHub

$ Bash example

# Install cutadapt (version 1.8.3 is quite old, consider a newer version if possible)
# conda install -c bioconda cutadapt=1.8.3

# Remove the 3'-adaptor sequence
cutadapt -a CTCGTATGCCGTCTTCTGCTTG -o trimmed_reads.fastq.gz reads.fastq.gz

View on GitHub

The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'.

Bowtie v1.1.1 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie

# Ensure the hg19 index is built or available. 
# If not, you would build it using bowtie-build:
# bowtie-build <path_to_hg19.fa> hg19

# Align reads to hg19
bowtie -l 25 -n 2 -k 1 -m 10 -e 200 --best --strata hg19 input_reads.fastq > output_aligned_reads.sam

View on GitHub

For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3).

cutadapt v1.8.3 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt=1.8.3

# Example usage:
# Assuming input.fastq.gz is the raw small RNA-seq reads
# and output.fastq.gz will be the trimmed reads.
cutadapt -g NNNTC \
         -a NNCTCGTATGCCGTCTTCTGCTTG \
         -o output.fastq.gz \
         input.fastq.gz

View on GitHub

The target insert sequences were required to be at least 16 nt.

fastp (Inferred with models/gemini-2.5-flash) v0.20.0 GitHub

$ Bash example

# Install fastp if not already installed
# conda install -c bioconda fastp

# Filter sequences to be at least 16 nt long
# Assuming single-end input for simplicity. For paired-end, use --in1, --out1, --in2, --out2.
fastp --in1 input.fastq.gz --out1 output.fastq.gz --length_required 16

View on GitHub

The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'.

Bowtie v1.1.1 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie=1.1.1

# Placeholder for Bowtie index for hg19. This index needs to be built prior to alignment.
# Example: bowtie-build <hg19.fa> hg19_index
hg19_index="/path/to/bowtie_indexes/hg19"

# Placeholder for input filtered reads file
filtered_reads="filtered_reads.fastq"

# Placeholder for output SAM file
output_sam="mapped_reads.sam"

bowtie -n 0 -e 80 -l 18 -a -m 5 --best --strata "${hg19_index}" "${filtered_reads}" > "${output_sam}"

View on GitHub

The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19).

samtools v0.1.19 GitHub

$ Bash example

# Install bedtools (version 2.24) and samtools (version 0.1.19) if not already installed
# conda install -c bioconda bedtools=2.24 samtools=0.1.19

# Install UCSC tools (bedGraphToBigWig) if not already installed
# wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig -O bedGraphToBigWig
# chmod +x bedGraphToBigWig

# Define input and output files
INPUT_BAM="input.sorted.bam" # Placeholder: Assuming input is a sorted and indexed BAM file
OUTPUT_BEDGRAPH="output.bedgraph"
OUTPUT_BIGWIG="output.bigWig"
CHROM_SIZES="hg38.chrom.sizes" # Placeholder: Path to chromosome sizes file (e.g., from UCSC goldenPath)

# Example of how to get chrom.sizes for hg38 (if not already available)
# fetchChromSizes hg38 > hg38.chrom.sizes

# Generate bedGraph from BAM using genomeCoverageBed
# samtools 0.1.19 might have been used for sorting/indexing the BAM prior to this step.
genomeCoverageBed -ibam "$INPUT_BAM" -bg -g "$CHROM_SIZES" > "$OUTPUT_BEDGRAPH"

# Convert bedGraph to bigWig
# bedGraphToBigWig is a UCSC tool, commonly used for this conversion.
# Its use is inferred as bigWig format is the specified output.
bedGraphToBigWig "$OUTPUT_BEDGRAPH" "$CHROM_SIZES" "$OUTPUT_BIGWIG"

View on GitHub

Raw Source Text

For each CLIP-seq read, the first 4nt index sequence was extracted and the 3'-adaptor sequence CTCGTATGCCGTCTTCTGCTTG was removed with program cutadapt (version 1.8.3). The reads with at least 18nt were mapped to the human genome (hg19) by Bowtie (version 1.1.1) with parameters '-l25 -n2 -k1 -m10 -e 200 --best --strata'.
For each small RNA-seq read, the firsst 5nt NNNTC was extracted and 3' adaptor sequence 'NNCTCGTATGCCGTCTTCTGCTTG' was removed with program cutadapt (version 1.8.3). The target insert sequences were required to be at least 16 nt. The filtered reads were mapped to the genome genome (hg19) by Bowtie (version 1.1.1) with parameters '-n 0 -e 80 -l 18 -a -m 5 --best --strata'.
The mapped reads were used to generate the genomic coverage profiles in bigWig format with programs genomeCoverageBed (bedtools version 2.24) and samtools (version 0.1.19).
Genome_build: hg19
Supplementary_files_format_and_content: bigWig for coverage in both strands

← Back to Analysis