'Deep Learning' 카테고리의 글 목록

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data (Canary)

Llama 3.1 Vision Language Model 관련 요약 (Llama 3-V)

Knowing Where to Focus: Event-aware Transformer for Video Grounding 리뷰 [ICCV 2023]

거대 언어 모델(LLM) 찍먹하기: GPT, LLaMA을 중심으로

Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning 리뷰 [CVPR 2022]

An Information-Theoretic Understanding of Maximum Manifold Capacity Representations 리뷰 [NeurIPS 2023 Workshop]

Efficient Coding of Natural Images using Maximum Manifold Capacity Representations 리뷰 [NIPS 2023]

Localizing Moments in Long Video Via Multimodal Guidance 리뷰 [ICCV 2023]

QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries (Moment-DETR) 리뷰 [NIPS 2021]

Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation 리뷰 [ICLR 2022]

RELIT: Weakly Supervised Vision-and-Language Pre-training with Relative Representation 리뷰 [arxiv 2023]

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment 리뷰 [CVPR 2022]

티스토리툴바