[miso-users] errors with run_events_analysis

Yarden Katz yarden at MIT.EDU
Fri Mar 22 22:49:45 EDT 2013


Hi Maayan,

It was an unfortunate bug with handling of non-UCSC style chromosomes that is now fixed in Github.  I guess most people use Ensembl annotations from UCSC as opposed to Ensembl (which follow UCSC-style chromosome naming) so it was not reported.  You can download the latest version here:

https://github.com/yarden/MISO

You can click on the "zip" button to download the latest version.  Just install it in place of current and it should work.  I was able to run it on your BAM with:

$ run_events_analysis.py --compute-genes-psi ./indexed/ ./maayan.bam --output-dir ./miso_output/ --read-len 100 --prefilter -p 10

I include the output I got below.  You don't have to use the "--prefilter -p 10" options but they do speed things up considerably if you're running on a single multi-core machine and not a cluster (note that "--prefilter" requires bedtools/tagBam to be available.) 

If you confirm that it works for you, I'll make a new release stable with this issue fixed. 

Thanks.  Best, --Yarden



========
$ ./test_maayan.sh 
MISO (Mixture of Isoforms model)
Probabilistic analysis of RNA-Seq data to detect differential isoforms
Use --help argument to view options.

Loading settings from: /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
Computing gene-level Psi for genes...
  - GFF index: /lab/solexa_data2/yarden/test-maayan/indexed
  - BAM: /lab/solexa_data2/yarden/test-maayan/maayan.bam
  - Read length: 100
  - Output directory: /lab/solexa_data2/yarden/test-maayan/miso_output
  - Prefilter: True
  - Prefiltering on
Prefiltering reads...
Generating coverage file...
  - BAM file: /lab/solexa_data2/yarden/test-maayan/maayan.bam
  - GFF file: /lab/solexa_data2/yarden/test-maayan/indexed/genes.gff
  - Output file: /lab/solexa_data2/yarden/test-maayan/miso_output/maayan.bed
Executing: bedtools intersect -abam /lab/solexa_data2/yarden/test-maayan/maayan.bam -b /lab/solexa_data2/yarden/test-maayan/indexed/genes.gff -f 1 -ubam  | bedtools coverage -abam - -b /lab/solexa_data2/yarden/test-maayan/indexed/genes.gff -counts > /lab/solexa_data2/yarden/test-maayan/miso_output/maayan.bed
  - Total of 166 events pass coverage filter.
Using 10 processors
Mapping genes to their indexed GFF representation, using /lab/solexa_data2/yarden/test-maayan/indexed
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr18
  - Loading 537 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000219.1
  - Loading 6 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_APD
  - Loading 185 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr3
  - Loading 2897 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr20
  - Loading 1273 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR4_1
  - Loading 9 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_MANN
  - Loading 322 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr17
  - Loading 2027 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr8
  - Loading 2297 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000193.1
  - Loading 5 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000230.1
  - Loading 3 genes
Skipping: /lab/solexa_data2/yarden/test-maayan/indexed/compressed_ids_to_genes.shelve
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000192.1
  - Loading 7 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr19
  - Loading 1950 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr15
  - Loading 1797 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr10
  - Loading 2193 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr12
  - Loading 2740 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_DBB
  - Loading 333 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr16
  - Loading 1347 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000205.1
  - Loading 8 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000223.1
  - Loading 5 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_COX
  - Loading 362 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000240.1
  - Loading 3 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrX
  - Loading 2323 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_SSTO
  - Loading 326 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000224.1
  - Loading 4 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr14
  - Loading 2167 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr5
  - Loading 2720 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr22
  - Loading 1189 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000228.1
  - Loading 17 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr2
  - Loading 3856 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr6
  - Loading 2782 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr7
  - Loading 2754 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000213.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000201.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr1
  - Loading 5145 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000222.1
  - Loading 3 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000194.1
  - Loading 3 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr9
  - Loading 2336 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000209.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000212.1
  - Loading 4 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000199.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_MCF
  - Loading 286 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000233.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr13
  - Loading 1176 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000195.1
  - Loading 2 genes
Skipping: /lab/solexa_data2/yarden/test-maayan/indexed/genes_to_filenames.shelve
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR6_MHC_QBL
  - Loading 344 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr21
  - Loading 695 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrMT
  - Loading 37 genes
Skipping: /lab/solexa_data2/yarden/test-maayan/indexed/genes.gff
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000243.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000191.1
  - Loading 2 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrY
  - Loading 513 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr4
  - Loading 2477 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000220.1
  - Loading 18 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chr11
  - Loading 3107 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrHSCHR17_1
  - Loading 48 genes
Loading indexed gene filenames from: /lab/solexa_data2/yarden/test-maayan/indexed/chrGL000247.1
  - Loading 3 genes
Preparing to run 11 batches of jobs...
Running batch of 16 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-0_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-0
Running batch of 17 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-1_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-1
Running batch of 16 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-2_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-2
Running batch of 17 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-3_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-3
Running batch of 17 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-4_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-4
Running batch of 16 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-5_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-5
Running batch of 17 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-6_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-6
Running batch of 16 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-7_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-7
Running batch of 17 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-8_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-8
Running batch of 16 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-9_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-9
Running batch of 1 genes..
  - Executing: python /lab/solexa_data2/yarden/MISO/misopy/run_miso.py --compute-genes-from-file "/lab/solexa_data2/yarden/test-maayan/miso_output/batch-genes/batch-10_genes.txt" /lab/solexa_data2/yarden/test-maayan/maayan.bam /lab/solexa_data2/yarden/test-maayan/miso_output --read-len 100  --overhang-len 1 --settings-filename /lab/solexa_data2/yarden/MISO/misopy/settings/miso_settings.txt
  - Submitted thread batch-10
Waiting on 11 threads...
  - Threads completed in 0.17 hours.



On Mar 21, 2013, at 7:23 PM, Maayan Kreitzman wrote:

> Hi Yarden,
> 
> I just downloaded the Homo_sapiens.GRCh37.65.gff you supply in the manual at the bottom of the "Human/mouse gene models for isoform-centric analysis" section  (http://genes.mit.edu/burgelab/miso/docs/#iso-centric)
> the folders in the indexed/ directory do have the chr prefix so it must be that. Was this the wrong gff3 to be using?
> I  put a piece f the bam file on dropbox:  https://www.dropbox.com/s/eljbpt1d6vjhuww/A08586_test.bam
> 
> maayan
> ________________________________________
> From: Yarden Katz [yardenny at gmail.com] On Behalf Of Yarden Katz [yarden at mit.edu]
> Sent: Thursday, March 21, 2013 3:45 PM
> To: Maayan Kreitzman
> Cc: miso-users at mit.edu
> Subject: Re: [miso-users] errors with run_events_analysis
> 
> Hi Maayan,
> 
> I think it's a bug with chromosome naming of the index files that was introduced a while ago and was fixed.  The newest version of MISO (in github) should have better error handling.  But to be sure, could you please tell me how your indexed chromosome directories are named, i.e. what is:
> 
> $ ls /projects/mkreitzman_prj/expression_quantification_testing/testing/MISO/indexed/
> 
> Finally, the easiest way to get to the bottom of it is if you send me the GFF you used (I assume it's from an Ensembl annotaton) and any small portion of the BAM.  I don't need the whole BAM, you can simply take out any segment of it that has the headers, e.g. one chromosome's worth of reads or even less, such as:
> 
> # pulls out all of chromosome 1
> $ samtools view -bh yourbam.bam 1 -o test.bam
> 
> # pulls out a segment of chromosome 1, bases 1-10000000
> $ samtools view -bh yourbam.bam 1:1-10000000 -o test.bam
> 
> A small BAM segment reveals nothing of anything and I assure you I will delete it once I get to the bottom of this bug.  I'm sure that this will be trivial to fix (if it isn't already fixed in the latest GitHub version).
> 
>   --Yarden
> 
> 
> 
> 
> On Mar 21, 2013, at 1:10 PM, Maayan Kreitzman wrote:
> 
>> Hi Yarden,
>> 
>> As far as I can tell, there are no prefixes in the bam file.
>> 
>> I attached the results from the three commands so you can take a look.
>> 
>> thanks for you help!
>> 
>> Maayan
>> ________________________________________
>> From: Yarden Katz [yardenny at gmail.com] On Behalf Of Yarden Katz [yarden at mit.edu]
>> Sent: Wednesday, March 20, 2013 5:50 PM
>> To: Maayan Kreitzman
>> Cc: miso-users at mit.edu
>> Subject: Re: [miso-users] errors with run_events_analysis
>> 
>> It would also be useful to see:
>> 
>> $ head /projects/mkreitzman_prj/expression_quantification_testing/testing/MISO/indexed/genes.gff
>> 
>> 
>> --Yarden
>> 
>> 
>> 
>> On Mar 20, 2013, at 8:46 PM, Yarden Katz wrote:
>> 
>>> Hi Maayan,
>>> 
>>> Can you confirm that your BAM file does not have chr prefixes?  I.e. can you please let me know what the output of these commands are:
>>> 
>>> $ samtools view /projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/JAGuaR_rev_87647/A08586_gr3.neg.bam | head
>>> 
>>> and
>>> 
>>> $ samtools view -H /projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/JAGuaR_rev_87647/A08586_gr3.neg.bam
>>> 
>>> Best, --Yarden
>>> 
>>> 
>>> 
>>> 
>>> On Mar 20, 2013, at 8:35 PM, Maayan Kreitzman wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I'm new to MISO, and would like some help. I've successfully installed misopy-0.4.7.
>>>> 
>>>> I also downloaded and indexed the ensembl gff3 (no "chr" in the chromosome names).
>>>> The next step isn't working though. My library is a paired-end strand specific library.
>>>> 
>>>> The command I ran is this:
>>>> 
>>>> python misopy/run_events_analysis.py --paired-end 75 8 --compute-genes-psi /projects/mkreitzman_prj/expression_quantification_testing/testing/MISO/indexed/ /projects/mkreitzman_prj/expression_quantification_testing/testing/test_data/strand_specific/JAGuaR_rev_87647/A08586_gr3.neg.bam --output-dir /projects/mkreitzman_prj/expression_quantification_testing/testing/MISO/results --read-len 100 &> /projects/mkreitzman_prj/expression_quantification_testing/testing/MISO/results/A08586.log
>>>> 
>>>> the errors I get look like this (I've attached the full std out + stderr):
>>>> Traceback (most recent call last):
>>>> File "/projects/mkreitzman_prj/expression_quantification_testing/misopy-0.4.7/misopy/run_miso.py", line 488, in <module>
>>>> main()
>>>> File "/projects/mkreitzman_prj/expression_quantification_testing/misopy-0.4.7/misopy/run_miso.py", line 432, in main
>>>> run_compute_genes_from_file(options)
>>>> File "/projects/mkreitzman_prj/expression_quantification_testing/misopy-0.4.7/misopy/run_miso.py", line 247, in run_compute_genes_from_file
>>>> event_type=options.event_type)
>>>> File "/projects/mkreitzman_prj/expression_quantification_testing/misopy-0.4.7/misopy/run_miso.py", line 128, in compute_gene_psi
>>>> gene_obj)
>>>> File "/home/mkreitzman/.local/lib/python2.7/site-packages/misopy-0.4.7-py2.7-linux-x86_64.egg/misopy/sam_utils.py", line 191, in fetch_bam_reads_in_gene
>>>> chrom = chrom_parts[1]
>>>> IndexError: list index out of range
>>>> 
>>>> The bam file is position sorted and has a .bai in the same folder. The parameters for the --paired-end are just off the top of my head, but I tried without this option at all and the same thing happens.
>>>> 
>>>> Thanks in advance for any ideas you may have.
>>>> 
>>>> Maayan<A08586.log>_______________________________________________
>>>> miso-users mailing list
>>>> miso-users at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>> 
>>> 
>>> _______________________________________________
>>> miso-users mailing list
>>> miso-users at mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>> 
> 




More information about the miso-users mailing list