[miso-users] Events with many gene annotations

Yarden Katz yarden at MIT.EDU
Wed Aug 28 18:39:42 EDT 2013


Hi all,

Thanks Sol for sharing your very helpful analysis.  Sol was one of the people to note that the larger events are probably mis-annotated.  These annotations are based on ESTs from work done 2008. 

Our revamped annotation set (v2.0 on the website) are generated more conservatively but we do not have yet the AFE/ALE/TandemUTR v2.0 annotations.  Hopefully these will be compiled soon and can be filtered to remove problematic events like these.

Independently of this, I agree that the method Tyler proposed will be more accurate for intersecting events with genes.  In the case of SE however, I'm not sure if it's a feature or a bug to report that a gene that resides in the up or downstream introns of the event is reported as an overlapping gene.  

Best, --Yarden



On Aug 28, 2013, at 5:51 PM, Sol Katzman wrote:

> Dear Yarden and Tyler,
> 
> A while back, I noticed some performance problems processing AFE/ALE events.
> 
> I extracted the distributions of the lengths of the "gene" items in the gff3
> (hg19) event definitions. There are many (1100+/500+) AFE/ALE events over 1Mb in length.
> Only a handful (10) such SE events.
> 
> I will send my stats in a follow-up email.
> 
> I think that the events longer than 1Mb are pretty questionable.
> 
> /Sol.
> 
> On 8/28/2013 2:16 PM, Tyler Funnell wrote:
>> Hi Yarden,
>> 
>> Yes that's right. The problem is most noticeable for the ALE/AFE events for the reason you mentioned, but I think the current event to gene mapping could have improper annotations for other event types as well. For example, small genes that exist within the introns in a SE event would be picked up.
>> 
>> Cheers,
>> Tyler
>> 
>> 
>> On Aug 28, 2013, at 2:03 PM, Yarden Katz <yarden at mit.edu> wrote:
>> 
>>> Hi Tyler,
>>> 
>>> Some of the AFE/ALE annotations, which we are currently reworking, have span very large genomic coordinates as you noted.  I believe these are probably dubious/faulty annotations.  But in any case, as you say, if you overlap the outer-most coordinates with genes there will potentially be many overlapping genes.
>>> 
>>> If I understand correctly, you're proposing to merge the first exon with all genes, then the second exon will genes, and take the intersection of those?
>>> 
>>> Best, --Yarden
>>> 
>>> On Aug 27, 2013, at 10:31 PM, Tyler Funnell wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I've noticed that for some alternative events, there are many gene annotations in the event to ensembl Id mapping file. For example AFE event 83896 at uc002kgt.1@uc002hvt.1 has quite a few. I think this might be because the left-most and right-most coordinates for this particular event cover a large section of the chromosome and the gene mappings are based on these coordinates. If I'm right, I think a better way would be to get the overlap between genes (or gene exons) and individual event exons first, then merge to the event level.
>>>> 
>>>> Thank you,
>>>> Tyler
>>>> 
>>>> 
>>>> _______________________________________________
>>>> miso-users mailing list
>>>> miso-users at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>> 
>> 
>> 
>> _______________________________________________
>> miso-users mailing list
>> miso-users at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/miso-users
>> 




More information about the miso-users mailing list