[StarCluster] Tophat run on a 2-node cluster
Manuel J. Torres
mjtorres.phd at gmail.com
Fri Aug 2 16:51:16 EDT 2013
I am trying to run tophat software mapping ~38 Gb of RNA-seq reads in fastq
format to a reference genome on a 2-node cluster with the following
properties:
NODE_IMAGE_ID = ami-999d49f0
NODE_INSTANCE_TYPE = c1.xlarge
Question: How many CPUs are there on this type of cluster?
Here is a df -h listing of my cluster:
root at master:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.9G 9.9G 0 100% /
udev 3.4G 4.0K 3.4G 1% /dev
tmpfs 1.4G 184K 1.4G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 3.5G 0 3.5G 0% /run/shm
/dev/xvdb1 414G 199M 393G 1% /mnt
/dev/xvdz 99G 96G 0 100% /home/large-data
/dev/xvdy 20G 5.3G 14G 29% /home/genomic-data
I created a third volume for the output that does not appear in this list
but is listed in my config file and which I determined I can read and write
to. I wrote the output files to this larger empty volume.
I can't get tophat to run to completion. It appears to be generating
truncated intermediate files. Here is the tophat output:
[2013-08-01 17:34:19] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2013-08-01 17:34:19] Checking for Bowtie
Bowtie version: 2.1.0.0
[2013-08-01 17:34:21] Checking for Samtools
Samtools version: 0.1.19.0
[2013-08-01 17:34:21] Checking for Bowtie index files (genome)..
[2013-08-01 17:34:21] Checking for reference FASTA file
[2013-08-01 17:34:21] Generating SAM header for
/home/genomic-data/data/Nemve1.allmasked
format: fastq
quality scale: phred33 (default)
[2013-08-01 17:34:27] Reading known junctions from GTF file
[2013-08-01 17:36:56] Preparing reads
left reads: min. length=50, max. length=50, 165174922 kept reads
(113024 discarded)
[2013-08-01 18:24:07] Building transcriptome data files..
[2013-08-01 18:26:43] Building Bowtie index from Nemve1.allmasked.fa
[2013-08-01 18:29:01] Mapping left_kept_reads to transcriptome
Nemve1.allmasked with Bowtie2
[2013-08-02 07:34:40] Resuming TopHat pipeline with unmapped reads
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[2013-08-02 07:34:41] Mapping left_kept_reads.m2g_um to genome
Nemve1.allmasked with Bowtie2
[main_samview] truncated file.
[main_samview] truncated file.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from
"/home/results-data/top-results-8-01-2013/topout/tmp/left_kept_reads.m2g\
_um_unmapped.bam".
[2013-08-02 07:34:54] Retrieving sequences for splices
[2013-08-02 07:35:16] Indexing splices
Warning: Empty fasta file:
'/home/results-data/top-results-8-01-2013/topout/tmp/segment_juncs.fa'
Warning: All fasta inputs were empty
Error: Encountered internal Bowtie 2 exception (#1)
Command: /home/genomic-data/bin/bowtie2-2.1.0/bowtie2-build
/home/results-data/top-results-8-01-2013/topout/tmp/segm\
ent_juncs.fa
/home/results-data/top-results-8-01-2013/topout/tmp/segment_juncs
[FAILED]
Error: Splice sequence indexing failed with err =1
Questions:
Am I running out of memory?
How much RAM does the AMI have and can I make that larger?
No matter what configuration starcluster I define, I can't seem to make my
root directory larger that 10Gb and it appears to full.
Can I make the root directory larger that 10GB?
Thanks!
--
Manuel J Torres, PhD
219 Brannan Street Unit 6G
San Francisco, CA 94107
VOICE: 415-656-9548
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130802/abfb701c/attachment.htm
More information about the StarCluster
mailing list