[miso-users] Events with many gene annotations
Yarden Katz
yarden at MIT.EDU
Wed Aug 28 18:39:42 EDT 2013
Hi all,
Thanks Sol for sharing your very helpful analysis. Sol was one of the people to note that the larger events are probably mis-annotated. These annotations are based on ESTs from work done 2008.
Our revamped annotation set (v2.0 on the website) are generated more conservatively but we do not have yet the AFE/ALE/TandemUTR v2.0 annotations. Hopefully these will be compiled soon and can be filtered to remove problematic events like these.
Independently of this, I agree that the method Tyler proposed will be more accurate for intersecting events with genes. In the case of SE however, I'm not sure if it's a feature or a bug to report that a gene that resides in the up or downstream introns of the event is reported as an overlapping gene.
Best, --Yarden
On Aug 28, 2013, at 5:51 PM, Sol Katzman wrote:
> Dear Yarden and Tyler,
>
> A while back, I noticed some performance problems processing AFE/ALE events.
>
> I extracted the distributions of the lengths of the "gene" items in the gff3
> (hg19) event definitions. There are many (1100+/500+) AFE/ALE events over 1Mb in length.
> Only a handful (10) such SE events.
>
> I will send my stats in a follow-up email.
>
> I think that the events longer than 1Mb are pretty questionable.
>
> /Sol.
>
> On 8/28/2013 2:16 PM, Tyler Funnell wrote:
>> Hi Yarden,
>>
>> Yes that's right. The problem is most noticeable for the ALE/AFE events for the reason you mentioned, but I think the current event to gene mapping could have improper annotations for other event types as well. For example, small genes that exist within the introns in a SE event would be picked up.
>>
>> Cheers,
>> Tyler
>>
>>
>> On Aug 28, 2013, at 2:03 PM, Yarden Katz <yarden at mit.edu> wrote:
>>
>>> Hi Tyler,
>>>
>>> Some of the AFE/ALE annotations, which we are currently reworking, have span very large genomic coordinates as you noted. I believe these are probably dubious/faulty annotations. But in any case, as you say, if you overlap the outer-most coordinates with genes there will potentially be many overlapping genes.
>>>
>>> If I understand correctly, you're proposing to merge the first exon with all genes, then the second exon will genes, and take the intersection of those?
>>>
>>> Best, --Yarden
>>>
>>> On Aug 27, 2013, at 10:31 PM, Tyler Funnell wrote:
>>>
>>>> Hello,
>>>>
>>>> I've noticed that for some alternative events, there are many gene annotations in the event to ensembl Id mapping file. For example AFE event 83896 at uc002kgt.1@uc002hvt.1 has quite a few. I think this might be because the left-most and right-most coordinates for this particular event cover a large section of the chromosome and the gene mappings are based on these coordinates. If I'm right, I think a better way would be to get the overlap between genes (or gene exons) and individual event exons first, then merge to the event level.
>>>>
>>>> Thank you,
>>>> Tyler
>>>>
>>>>
>>>> _______________________________________________
>>>> miso-users mailing list
>>>> miso-users at mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>>
>>
>>
>> _______________________________________________
>> miso-users mailing list
>> miso-users at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/miso-users
>>
More information about the miso-users
mailing list