[Datasets] Annotating home activity

Mon Aug 3 22:47:17 EDT 2009

One issue that came up in our coding of office activities was time granularity.  Instead of coding things at extremely high granularity, I seem to recall we decided to code whether each activity occurred in each 5 second window.  This made it easier to code multiple activities in a single pass, which was important to us because we had a lot of overlapping activities (sitting, typing, looking at the computer monitor).

Hopefully that makes sense.  Our CHI_2003/TOCHI_2005 paper might have more detail, or I'd be willing to talk sometime about how we did our coding.  Not sure whether it was a good or bad approach, but I can at least tell you what it was.  :)

James

--
James A. Fogarty, Assistant Professor
Computer Science & Engineering, University of Washington

http://www.cs.washington.edu/homes/jfogarty/  

-----Original Message-----
From: datasets-bounces at mit.edu [mailto:datasets-bounces at mit.edu] On Behalf Of Jason Nawyn
Sent: Monday, August 03, 2009 8:10 AM
To: datasets at mit.edu
Subject: [Datasets] Annotating home activity

As part of our home data collection project, we will be making our  
multimodal sensor datasets publicly available for use in the research  
community.  In order to enhance the value of these resources, we plan  
to provide observational annotations for a broad selection of high- 
level activities that typically occur in the home.  Sample activities  
to be annotated include: brushing teeth, exercising, mopping, ironing,  
preparing a snack, etc.  A full list of the proposed items is  
available at: http://boxlab.wikispaces.com/Annotation

In order to reflect the ambiguity that naturally occurs as an  
individual transitions among (often simultaneous) activities, we are  
proposing a discrete rating scale that will be applied to activity  
labels according to the annotator's confidence that the indicated  
behavior is currently happening.  We are presently seeking feedback on  
this strategy. The proposed "certainty values" are as follows:
	- Not happening
	- Possibly
	- Likely
	- Definitely

These values will be time stamped and stored as integers (0-3) in the  
annotation data file along with the activity labels.  While an  
activity is ongoing, annotators will be encouraged to modify their  
certainty ratings according to whether the observed behavior increases  
or decreases the likelihood that the activity is occurring. Certainty  
ratings should reflect ambiguity from a number of sources.  Here are  
some suggestions for how this system might be applied:
	- Behaviors that indicate preparation for an activity but are not  
intrinsic to the common sense definition of the activity might be  
included in the annotation, but at lower certainty values
	- If an individual is engaged in an ongoing activity but briefly  
focuses on another task, the pause may be reflected by marking the  
first activity with lower certainty
	- Long pauses that clearly interrupt the current activity may result  
in a rating of "Not happening" for that activity

We are currently testing a user interface that makes it relatively  
easy to apply these labels, and would welcome any comments you might  
have about this or other strategies to introduce confidence levels  
into annotation sets (preferably without requiring multiple  
independent coders).  If you don't see the value of this exercise,  
we'd appreciate that perspective too.

Thanks,

Jason

_______________________________________________
Datasets mailing list
Datasets at mit.edu
http://mailman.mit.edu/mailman/listinfo/datasets