Want to build a web-based data analysis platform. Basic idea is the user visits the homepage. There is a prompt saying "click here to upload your data". After uploading, and analysis is run using linux-based open source tools. Then their results are displayed and they're allowed to downlaod their analyses. I can send you examples of similar sites with the similar functionality. Is that something you all could do?
We are the National Institute of Standards and Technology. We produce reference materials and reference data for the entire US. The site we're building will be for internal use only. Behind a firewall. Won't be available to the public
Here is some content for the site. And an image attached
iBioCloud Tool for Analyzing Data for RM8376
The goal is to build a web-based, friendly interface that allows users to analyze their raw data generated from RM8376. This same interface displays results that are meaningful and visually insightful. This cloud analytics tool will run from a NIST AWS instance with >100GB of RAM and >24 processors to maximize efficiency (reduce analysis time). Initially this tool will be behind the NIST firewall, but eventually it could be made available to outside customers.
Option of using Amazon S3 buckets to house data, rather than storing directly on EC2 instance.
In short, this tool will perform the following processes:
Take raw data and metadata as input from the user (upload fastq file and enter metadata).
Performed file size check, & quality filter using seqtk. Generates a new filtered fastq file
Map reads to each reference genome using bowtie2. Generates 1 sam files
Convert sam file to bam file using samtools
Sort bam files using samtools
Perform mapping summary using QualiMap tool. Generates .html report
Display QualiMap reports
Reports relative abundance of each (from genome coverage)
Before you start:
Install on Linux Ubuntu EC2 instance:
Environments modules for version control of software
On the EC2 instance (or S3 bucket), establish permanent folder of reference genomes
All RM8376 genomes (20) are present as fasta files
Index each genome using bowtie2 (use “bowtie2-build” command). Indexed genomes will have the .bt2 extension. Note that multiple bt2 files will be generated for each genome
This only needs to be done once and this folder should contain 20 fasta files and all the corresponding .bt2 files
Set user files to delete after 30 days?