VideoLT: Large-scale Long-tailed Video Recognition

EXPLORE


DOWNLOAD


V1.0 is available now! Wanna take a try? : )

Annotation Raw Videos (2.1T)


Describe:

  • ResNet-50, ResNet-101 and TSM(ResNet-50) features are of size [150, 2048] for each video (frames: 150, feature size: 2048).
  • ResNet-50 and ResNet-101 features are pretrained on ImageNet, TSM(ResNet-50) feature is pretrained on Kinetics.
  • Annotation format:

    train.lst, validate.lst, test.lst

                            
    vid label_id
    xYjQkWxF8h0 126
    GtDLgqe-qiM 382
    u9KATdP5bNo 369
    04qmkPTuRmQ 155
    BFh8aa7asvw 745
    6U4SxTJ71Xk 425
    _7iRdKirjIk 282
    QRgUdZUyu1U 722
    ...
                            
                            

    count-labels-train.lst

                                
    label label_id num
    3DPainting 0 137.0
    3DPrinter 1 308.0
    ACappella 2 87.0
    ATM 3 60.0
    AngkorWat 4 233.0
    BabyLearningToEatWithSpoon 5 212.0
    BigBen 6 102.0
    ChineseBrushWriting 7 117.0
    ...
                                
                            

News:

  • (04/25/2021) VideoLT v1.0 is released, download link coming soon...
  • [Important] Currently, VideoLT can only be accessed by sending us an e-mail: zhangxing18@fudan.edu.cn.


  • Features:

    ResNet-50 Feature (295G) ResNet-101 Feature (295G) TSM(ResNet-50) Feature (295G)



    Citation:

    If VideoLT helps your work, please consider citing:

                        
    @misc{zhang2021videolt,
    title={VideoLT: Large-scale Long-tailed Video Recognition}, 
    author={Xing Zhang and Zuxuan Wu and Zejia Weng and Huazhu Fu and Jingjing Chen and Yu-Gang Jiang and Larry Davis},
    year={2021},
    eprint={2105.02668},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }