Behavior Recognition using Cuboids: 2007

Monday, March 5, 2007

Bad Results

We finished implementing our behavior descriptor based solely on Displacement lines. We used cross-validation to test it out by testing on one set at a time and training on the other sets. Unfortunately, our descriptor hasn't performed as expected, with most of our sleep dataset being mislabeled as exploring. We're thinking it's due to some bug in our code, but we still haven't found exactly where that bug is. Intuitively, sleeping clips don't produce cuboids. There's no movement taking place during those video clips so the response function doesn't pick up anything. Since there's no movement, our displacement graph is basically all zeros for that type of clip. Conversely, there's no reason why an exploring clip would have a displacement graph consisting of all zeros.

Meanwhile, we've also been looking at different metrics to use besides Euclidean distance.
We're currently considering chi-squared and Mahalanobis in addition to Euclidean.
The reason we're inclined towards using the Mahalanobis distance is that it takes into account the covariance among the variables in calculating distances. Doing this solves problems related to scale and correlation in Euclidean distances. When using Euclidean distance, the set of points equidistant from a given location is a sphere. The Mahalanobis distance stretches this sphere to correct for the respective scales of the different variables, and to account for correlation among variables.

from http://matlabdatamining.blogspot.com/2006/11/mahalanobis-distance.html

We'll be testing other distance metrics once we've fixed this bug.

Update: currently implementing k-nn to do line comparisons, hopefully this will solve the problem.

Wednesday, February 21, 2007

Detecting Movement cont.

We tried to use blob detection in order to approximate the movement of the mouse. We took the centroid of the blobs to represent the center mass of the mouse, and the change in position of the centroid would approximate the change in the mouse position. However, using blobs this way turned out to be a chaos, we detected more noise than the actual movement of the mouse.

To resolve this issue, we tried using the cuboids. We computed a binary image of the cuboids that appear, and then calculate a centroid from the image. The centroid serves the same purpose as before: to approximate the center mass of the mouse. Here is the unfiltered graph of the mouse movement:

Unfiltered x and y displacement for exploring

Unfiltered x and y displacement for grooming

We noted that there is a distinctive pattern between exploring and grooming. The x and y displacement while the mouse is exploring is greater than when the mouse is grooming. Furthermore, the x and y displacement when the mouse is grooming tend to be around zero. This is intuitive because the mouse does not move as much when it is grooming than when it is exploring. We want to add this result as a feature to Piotr's cuboids, but we need to filter out more noise. We used median and average filtering to obtain the following results:

Filtered x displacement for exploring

Filtered y displacement for exploring

Filtered x displacement for grooming

Filtered y displacement for grooming

Next step:
- Create displacement graphs for all training set
- Scale the graphs to 100 frames for consistency

Monday, February 12, 2007

Detecting movement

After last week's class, we've implemented a way of detecting movement suggested by Serge. By subtracting the average background for each frame and binarizing the image, we can get a representation of movement.
We still need to work on the threshold used and clean up the blobs a bit, but it looks promising.

Here is the clip of blobs obtained from explore001 from set00:

Here are the blobs from groom001 from set00:

Current work:
1. Clean up our binary images, adjust the threshold to keep the number of blobs small.
2. Use each blob's centroid to create a displacement graph with respect to time in X and Y directions.
3. Create displacement prototypes based on these graphs by clustering them together.
4. Add this information to our behavior descriptor.

Monday, February 5, 2007

Most commonly mislabeled behavior

The memory problem was solved by saving and clearing the workspace after work on a specific clip is done, unfortunately this makes the code run a bit slower (5+ hrs). For now, we'll only be using a subset of the whole data set; set00-set003. One of the most commonly mislabeled behaviors using the cuboids code is grooming, it's most often labeled as exploring. Grooming is characterized by the movement of the mouse's paws across its face or its face across its legs while the mouse stays in the same place. Drinking is also commonly mislabeled as exploring. We believe the main difference between these two behaviors is the movement of the mouse from one place to another. By keeping track of where cuboids are detected and incorporating that data into the behavior descriptor, we believe we can increase the accuracy of these behaviors.

The following videos show the cuboids as they're detected by the response function.
(still waiting for processing by google video)

Here we have a sample clip for grooming where the mouse stays in the same spot and mostly just moves his paws and face:

A grooming clip where the mouse moves around a bit:

A sample clip for drinking:

A clip for exploring:

Monday, January 29, 2007

Current work

Now that we've gotten familiar with Piotr's code, our next step is to run it using the entire smart vivarium dataset. We'll look at the results to find which video clips were labeled incorrectly and figure out how positional information can help prevent those errors.

cuboids!

We've been going over Piotr's matlab code for cuboids and feel comfortable using it now. We first ran the recognition demo on the face dataset, afterwards, we modified the demo to run on mice behavior dataset. Due to memory constraints, we couldn't finish running it, but we'll fix this by tonight.

Here's a sample clip from the smart vivarium dataset, drink02.avi from set00:

Here are the cuboids obtained from that video clip set to loop 10 times, each cuboid lasts approximately 1 second:

Here we have a sample clip of the cuboids clustered together by prototypes from the smart vivarium dataset:

copyright info:
This database is Copyright © 2005 The Regents of the University of California. All Rights Reserved. Permission to use, copy, modify, and distribute this database and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies. Permission to incorporate this database into commercial products may be obtained by contacting:

Technology Transfer Office
9500 Gilman Drive,
Mail Code 0910
University of California La Jolla,
CA 92093-0910
(858) 534-5815
invent@ucsd.edu

This database and documentation are copyrighted by The Regents of the University of California. The database and documentation are supplied "as is", without any accompanying services from The Regents.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION,

EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Wednesday, January 24, 2007

Cuboids Code

Piotr gave us the access to his cuboids package. We will start learning how the code works and will show the cuboids on Monday.

Monday, January 22, 2007

Our Behavior Descriptor

E-mailed Piotr last week asking for his implementation of cuboids. Since we'll be going through his code, we've decided to start working on our behavior descriptor. Specifically, how we'll represent the spatial relationships between cuboids.
Agarwal et al. keep track of spatial relationships between detected parts by dividing the angle between each pair into bins of 45 degrees and measuring the distance between parts by window size. They represent this information in the feature vector of each training image. Their feature vector is set up as a series of binary features, indicating whether or not a part or relationship is present.
Our task is to extend this into the spatio-temporal domain.
Possible ways to do this are:

Calculate distance and angle between each pair of cuboids in x, y coordinates, store time difference in a separate field.
Calculate euclidean distance between each pair of cuboids in 3d using x, y, and t coordinates.

Once we have the relationships between parts, we include them in our final behavior descriptor.
We'll most likely be using a histogram of the cuboid types present and the relationships between these.

Last week's presentation

Uploaded last week's presentation at:
http://www.sharebigfile.com/file/66521/cuboidsIntro-ppt.html
it's based on the presentations we've linked to below in the previous post.

Wednesday, January 10, 2007

Useful Links

Here are a few links and brief summaries for some of the papers we'll

be working from:

Behavior Recognition via Sparse Spatio-Temporal Features by Dollár et al.
The main paper we'll be using for this project. This paper introduces the use of a response function based on a quadrature pair of gabor filters applied temporally and a 2d Gaussian applied along the spatial dimension. Cuboids (small spatio-temporal video clips) are then extracted at each local maxima given by the response function applied to a clip of video. A transformation is then applied to the cuboid (the paper tests several) and a feature vector is then created. Since the amount of cuboids possible is large, but only a few types are possible, similar cuboids are then clustered together to form cuboid prototypes. Each behavior is then described as a video clip in which a given set of cuboid prototypes is present.

Learning to Detect Objects in Images via a Sparse, Part-Based Representation by Agarwal et al.
One of the first papers to use sparse features for object detection. Agarwal et al. use the Föstner corner detector to find interest points. They then use 2d windows around the interest points to create a vocabulary of parts. Objects are described by the presence and relative positioning of each part wrt other parts. SNoW is used to train a classifier based on those features.

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words by Niebles et al.
Based on Dollár et al. this paper uses the same response function, however, they use a probabilistic Latent Semantic Analysis (not quite sure how this works yet) model to determine behavior.

Some useful tutorials and manuals:
Gabor Filters
Fairly complex tutorial, still having a hard time completely understanding Gabor filters.
SNoW (Sparse Network of Winnows)
Described by Roth as a "multi-class classifier", the executable is available on their website. We might use this to include relative cuboid positioning in our project.

Presentations:
Object Recognition using sparse features
Presentation for Agarwal et al.'s paper. Includes a short demo on SNoW.
Behavior Recognition using Cuboids
Presentation for Dollár et al.'s paper.

Datasets:
Mouse behavior dataset
sets of clips obtained from the smart vivarium

Behavior Recognition using Cuboids