Action recognition is an important yet challenging task in computer vision. A successful and widely used framework in this field is the Bag of Visual Words (BoVW), wherein the first step is to extract local features. One critical property of local features is that they are often multi-view, e.g., dense trajectory feature includes both appearance and motion properties. Different types of features are aligned together in coding and pooling thus leading the process to be heavily entangled. Our motivation is to disentangle each sub-descriptor and l...