[miso-users] Install miso using pip and mixed read length BAMs

Yarden Katz yarden at mit.edu
Tue Jan 6 21:09:26 EST 2015


Hi,

See below for comments:

On Jan 5, 2015, at 6:29 AM, Maurits Evers <maurits.evers at ur.de> wrote:

> Dear all.
> 
> I have been trying to install&run miso on my Mac and have run into a 
> couple of problems/issues. Any help and/or clarifications would be 
> greatly appreciated.
> 
> 1. I did a global install following the recommended installation method 
> using pip. Everything seems to install fine, and importing misopy and 
> pysplicing from within python works. However, miso, module_availability 
> and test_miso are unknown commands. Chasing the binaries on my machine, 
> I can see that they are located at 
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin. Adding 
> this location to PATH fixes the issue of the unknown miso executables. 
> Do I need to add anything else?

When you install MISO with a package manager like "pip", the executables of the package (binaries like "miso"), get placed at a system-specific binary directory -- whose location is unfortunately not standard -- and in your case happens to be /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin.  It is sometimes placed in ~/.local/bin.  So that has to be in your PATH for the executables to be accessible.  You only need to do it once and all executables from all Python packages should be available, so no need to do anything else.

A more ideal solution in general is to use pip along with virtualenv, to make a virtual environment that contains all the packages needed for a particular task -- but it's of course not required.


> 
> 2. As to testing the install, module_availability runs fine. test_miso 
> returns a "Run 0 tests in 0.000s". When I try to execute test_miso from 
> within
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy 
> via python test_miso.py it seems to run the 3 tests mentioned in the 
> documention, but I end up with errors such as the following
> 
>     .Testing conversion of SAM to BAM...
>     Executing: sam_to_bam --convert
> 
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-data/sam-data/c2c12.Atp2b1.sam
> 
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output/sam-output
>     Converting SAM to BAM...
>     Traceback (most recent call last):
>        File
> 
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/sam_to_bam",
>     line 9, in <module>
>          load_entry_point('misopy==0.5.2', 'console_scripts',
>     'sam_to_bam')()
>        File
> 
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py",
>     line 63, in main
>          sam_to_bam(sam_filename, output_dir, header_ref=ref)
>        File
> 
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py",
>     line 13, in sam_to_bam
>          os.makedirs(output_dir)
>        File
> 
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py",
>     line 150, in makedirs
>          makedirs(head, mode)
>        File
> 
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py",
>     line 157, in makedirs
>          mkdir(name, mode)
>     OSError: [Errno 13] Permission denied:
> 
> '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output'
> 
> I don't know why test_miso fails to run properly. Is this something to 
> worry about?

This is a kink which is partly our fault and partly the terrible and cumbersome way in which Python packages work.  Our test suite needs to create files.  Since you did not use virtualenv (which would create a mini environment in directories where you have write access to), your package got installed by pip in a system wide directory (/opt/local/...).  As a user, it looks like you don't have write access there, so our test suite fails because it needs to create files and it cannot.  I'll workaround this in the next release.

> 
> 3. I have paired-end mouse RNA-seq data which I mapped to the mm10 
> reference genome using tophat. The bam file is sorted and indexed, and I 
> indexed successfully the gff annotation file. Upon running miso with
> 
>     miso --run indexed ../alignment/tophat/WT.bam --settings-filename
>     miso_settings.txt --output-dir WT/ --paired-end 472 277 --read-len 120

Your "--paired-end" parameters look very off -- your insert length distribution most likely does not have a mean of 472 and a standard deviation of 277.  The standard deviation looks far too big, are you sure it's not sqrt(277) = ~17?

> 
> I get the warning that miso found mixed length reads within the BAM 
> file. Prior to mapping, reads were adapter-trimmed and quality-filtered 
> so naturally aligned reads will have a read-length distribution. I don't 
> understand what to make of this warning. I would assume that most 
> RNA-seq data consists of different read lengths, due to some form of 
> trimming/filtering of the raw data. I don't understand why miso would 
> require reads to have the same length in order to be able to estimate 
> isoform expression. Could you advise how to proceed? The read length 
> distribution shows reads with lengths between 20 and 120 nt. Running 
> miso for each of the read lengths separately would be possible but 
> tedious, requiring 100 separate runs followed by merging the individual 
> output files.

It's unfortunately the case that for now MISO requires the reads to be the same length.  In our experience, trimming the adapters can certainly create variability, but a variation between 20 and 120 is far larger than I've seen, and seems extreme.  In most cases, reads hover around a certain length, such that the minimum length is still basically "as good" as the longest length reads.  E.g. if your reads were between 35-45, you could just trim the reads to 35 -- so you'd have the exact same number of reads (just shorter), and you wouldn't need multiple runs.  But we will adapt MISO to work with multiple read lengths (it requires substantial changes to the code currently.)  

What fraction of your reads would you lose if you took reads that are at least 100?  Since the adapter is fixed length, so I'm assuming most of your trimming is caused by poor base quality.  It seems very extreme to have to trim off over 80% of the read, i.e. going from 120 nt to 20 nt, and it shouldn't happen frequently in a high quality RNA-Seq run. 

Yarden 

> 
> Best regards,
> Maurits
> 
> 
> -- 
> Dr. Maurits Evers
> Center for Integrative Bioinformatics Vienna
> Max F. Perutz Laboratories
> Dr. Bohr Gasse 9
> A-1030 Vienna, Austria
> 
> 
> 
> -- 
> Dr. Maurits Evers
> Statistical Bioinformatics
> Institute of Functional Genomics
> University of Regensburg
> Josef-Engert-Str. 9 (Biopark I)
> 93053 Regensburg, Germany
> _______________________________________________
> miso-users mailing list
> miso-users at mit.edu
> http://mailman.mit.edu/mailman/listinfo/miso-users




More information about the miso-users mailing list