[miso-users] Error: tagBam call failed when running pe_utils.py --compute-insert-len

Yarden Katz yarden at MIT.EDU
Mon Jul 8 10:01:18 EDT 2013


Hi,

Please see:

http://genes.mit.edu/burgelab/miso/docs/#answer13

We are working on native support for replicates.

  Best, --Yarden

On Jul 7, 2013, at 4:29 PM, Gu Mi wrote:

> Hi Yarden:
> 
> Thanks for your solution -- so far it works well by the use of ensGene.gff3.
> 
> I have one more question: actually I have two conditions for comparison of differential exon usage, a control group and a treatment group, each having 3 biological replicates. Currently I only compare C1 to T1, but how can I use all the biological replicates? Shall I merge the 3 BAM files in each condition into a single one, and then compare Control_merged and Treatment_merged?
> 
> Thank you!
> 
> Best,
> Gu
> 
> 
> On Sun, Jul 7, 2013 at 3:05 PM, Yarden Katz <yarden at mit.edu> wrote:
> Hi,
> 
> It does look like a headers mismatch then.  Your BAM file contains "chr" style chromosomes (e.g. "chr10" and not "10").  I believe your GFF, /directories/exons/Homo_sapiens.GRCh37.65.min_1000.const_exons.gff, is from Ensembl which would not contain chr-prefixes.  Just look in that gff file and see what the chromosome entries are like.  If they don't have chr, the operation will fail.  All you need to do is generate a constitutive exons file from a UCSC gff which contains chromosome headers that match your .bam file.
> 
> See:
> 
> http://genes.mit.edu/burgelab/miso/docs/#human-mouse-gene-models-for-isoform-centric-analyses
> 
> If you're using hg19, you can use this GFF:
> 
> http://genes.mit.edu/burgelab/miso/annotations/ucsc_tables/hg19/ensGene.gff3
> 
> Use our exon_utils program to generate constitutive exons from this file and then rerun pe_utils with that, instead of the GRCh37 gff file.
> 
>   --Yarden
> 
> 
> 
> On Jul 7, 2013, at 2:27 PM, Gu Mi wrote:
> 
> > Dear Yarden:
> >
> > Thank you for your prompt reply! Here is the output from the first command:
> >
> > @HD     VN:1.0  SO:coordinate
> > @SQ     SN:chr1 LN:249250621
> > @SQ     SN:chr10        LN:135534747
> > @SQ     SN:chr11        LN:135006516
> > @SQ     SN:chr12        LN:133851895
> > @SQ     SN:chr13        LN:115169878
> > @SQ     SN:chr14        LN:107349540
> > @SQ     SN:chr15        LN:102531392
> > @SQ     SN:chr16        LN:90354753
> > @SQ     SN:chr17        LN:81195210
> > @SQ     SN:chr18        LN:78077248
> > @SQ     SN:chr19        LN:59128983
> > @SQ     SN:chr2 LN:243199373
> > @SQ     SN:chr20        LN:63025520
> > @SQ     SN:chr21        LN:48129895
> > @SQ     SN:chr22        LN:51304566
> > @SQ     SN:chr3 LN:198022430
> > @SQ     SN:chr4 LN:191154276
> > @SQ     SN:chr5 LN:180915260
> > @SQ     SN:chr6 LN:171115067
> > @SQ     SN:chr7 LN:159138663
> > @SQ     SN:chr8 LN:146364022
> > @SQ     SN:chr9 LN:141213431
> > @SQ     SN:chrM LN:16569
> > @SQ     SN:chrX LN:155270560
> > @SQ     SN:chrY LN:59373566
> > @PG     ID:TopHat       CL:/informatics/tools/Linux-AS5/bin/tophat -o dir/Lane3 -g 1 --coverage-search --microexon -r 100 --phred64-quals --library-type fr-unstranded -p 4 -G gene_models/Homo_sapiens.GRCh37.72_norm.gtf --transcriptome-index=gene_models/transcripts dir/Human/bowtie/human_ref_genome dir/L1_1.fq.gz dir/L1_2.fq.gz    VN:1.4.1
> >
> > The output for the second command is exactly the same as above. As they're not empty, I guess there might be other reasons for the error?
> >
> > Thanks again!
> >
> > Best,
> > Gu
> >
> >
> > On Sun, Jul 7, 2013 at 2:15 PM, Yarden Katz <yarden at mit.edu> wrote:
> > Hi,
> >
> > See replies below:
> >
> > On Jul 7, 2013, at 2:04 PM, Gu Mi wrote:
> >
> >> Dear All:
> >>
> >> I am using MISO to test for differential exon usage between a control and a treatment group. I got an error when computing the insert length distribution using pe_utils.py --compute-insert-len. I list the steps I used below:
> >>
> >> 1. sort the BAM file from TopHat (by coordinate):
> >> samtools sort control.bam control_sorted
> >>
> >> 2. index the BAM file:
> >> samtools index control_sorted.bam control_sorted.bai
> >>
> >> 3. run pe_utils.py:
> >> python pe_utils.py --compute-insert-len controlam /directories/exons/Homo_sapiens.GRCh37.65.min_1000.const_exons.gff --output-dir /directories/insert-dist/
> >>
> >> After the command above, I got the error message:
> >>
> >> Preparing to call bedtools 'tagBam'
> >> tagBam -i control.bam -files /directories/exons/Homo_sapiens.GRCh37.65.min_1000.const_exons.gff -labels gff -intervals -f 1 | samtools view - -h | egrep '^@|:gff:' | samtools view - -Shb -o /directories/insert-dist/bam2gff_Homo_sapiens.GRCh37.65.min_1000.const_exons.gff/control.bam
> >> [samopen] SAM header is present: 25 sequences.
> >> [sam_read1] reference 'ID:TopHat        CL:/informatics/tools/Linux-AS5/bin/tophat -o Lane3 -g 1 --coverage-search --microexon -r 100 --phred64-quals --library-type fr-unstranded -p 4 -G gene_models/Homo_sapiens.GRCh37.72_norm.gtf --transcriptome-index=gene_models/transcripts /directories/Genomes/NCBI_Jul-09-2012/Human/bowtie/human_ref_genome Lane3_1.fq.gz Lane3_2.fq.gz     VN:1.4.1
> >> ' is recognized as '*'.
> >> [main_samview] truncated file.
> >> Traceback (most recent call last):
> >>   File "/pe_utils.py", line 520, in <module>
> >>     main()
> >>   File "pe_utils.py", line 517, in main
> >>     sd_max=sd_max)
> >>   File "pe_utils.py", line 271, in compute_insert_len
> >>     output_dir)
> >>   File "exon_utils.py", line 185, in map_bam2gff
> >>     raise Exception, "Error: tagBam call failed."
> >> Exception: Error: tagBam call failed.
> >
> > It sounds like tagBam could not map any of your reads to the GFF, which is usually a headers mismatch issue.
> >
> > Could you confirm that your BAM file contains Ensembl style headers, and not UCSC ones?  What is the output of this command command for you:
> >
> > samtools view -H /directories/insert-dist/bam2gff_Homo_sapiens.GRCh37.65.min_1000.const_exons.gff/control.bam
> >
> > Also, what is the output of the following for you?
> >
> > tagBam -i control.bam -files /directories/exons/Homo_sapiens.GRCh37.65.min_1000.const_exons.gff -labels gff -intervals -f 1 | samtools view - -h | egrep '^@|:gff:'
> >
> > If this yields something empty, it means that tagBam could not match your BAM to the GFF, probably because of the chromosome headers mismatch issue that I mentioned.
> >
> > Best, --Yarden
> >
> >>
> >> I used Homo_sapiens.GRCh37.72_norm.gtf from Ensembl as the annotation file when preparing my data, but downloaded
> >>      • Human genome (hg19) alternative events v2.0
> >> from the MISO website and unzipped. I saw it is based on Homo_sapiens.GRCh37.65. Is this the version problem? If so, could anyone provide the latest GFF3 file for use? Thank you for your suggestions!
> >>
> >> Best,
> >> Gu
> >> _______________________________________________
> >> miso-users mailing list
> >> miso-users at mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/miso-users
> >
> >
> >
> >
> > --
> > Ph.D. student
> > Department of Statistics
> > Oregon State University
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > "Some people dream of success, while others wake up and work hard at it." ~ Author Unknown.
> >
> 
> 
> 
> 
> -- 
> Ph.D. student
> Department of Statistics
> Oregon State University
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> "Some people dream of success, while others wake up and work hard at it." ~ Author Unknown.
> 




More information about the miso-users mailing list