Luchuan Song

I am a third-year Ph.D. student in the Computer Science Department at the University of Rochester (UofR). My advisor is Prof. Chenliang Xu. Before that, I received my master and bachelor degree from University of Science and Technology of China (USTC) under the supervision of Prof. Nenghai Yu and Prof. Bin Liu.

I am focusing on human related topic (e.g. face animation and stylization, 3D face reconstruction, deepfake detection e.t.c.). And I also working on the egocentric video understanding, object detection/segmentation and optical character recognition (OCR). Insead of research, I often mountaineering and have summitted the Muztagh Ata and Chola mountains.

Email  /  Google Scholar  /  Github / Linkedin

photo
@On the way to Kawagarbo, Dec. 2022
News
[07/2024] Invited tutorial talk on Multimedia Deepfake Detection @ ICME 2024.
[05/2024] I will be joining Adobe Research for internship, work with Dr. Yang Zhou.



Research
TextToon: Real-Time Text Toonify Head Avatar from Single Video
Luchuan Song, Lele Chen, Celong Liu, Pinxin Liu, Chenliang Xu
Siggraph Aisa, 2024
project page / paper / code

We present a method to generate a drivable toonified avatar. Given a monocular video and a written instruction about the avatar style, it can generate a toonified avatar that can be animated in real time.

Tri2-plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song, PinXin Liu, Lele Chen ,Guojun Yin, Chenliang Xu
ECCV, 2024
project page / paper / code

We attach the multi-combined tri-plane sturcture for monocular photo-realistic volumetric head avatar reconstructions.

Adaptive Super Resolution for One-Shot Talking Head Generation
Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu
ICASSP, 2024
paper / code / video

We apply the mix-resolution images in one-shot talking head training. The resolution could achieve 512px from 256px in previous.

EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi, Yunlong Tang, Luchuan Song, Ali Vosoughi, Nguyen Nguyen, Chenliang Xu
ACM'MM, 2024
paper / code

We introduce the EAGLE (Egocentric AGgregated Language-video Engine) model and dataset for egocentric video understanding tasks.

IDRNet: Intervention-driven relation network for semantic segmentation
Zhenchao Jin, Xiaowei Hu, Lingting Zhu, Luchuan Song, Li Yuan, Lequan Yu
NeurIPS, 2023
paper / code

We leverage a deletion diagnostics procedure in image segmentation.

Video understanding with large language models: A surveyn
Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, ..., Ping Luo, Jiebo Luo, Chenliang Xu
Arxiv, 2023
paper / project page

The comprehensive survey covers video understanding techniques powered by large language models (Vid-LLMs).

Emotional Listener Portrait: Neural Listener Head Generation with Emotion
Luchuan Song, Guojun Yin, Zhenchao Jin, Xiaoyi Dong, Chenliang Xu
ICCV, 2023
paper / video

We propose a method to implement the head movement of the listening head with expression.

Face Forgery Detection via Symmetric Transformer
Luchuan Song, Xiaodan Li, Zheng Fang, Zhenchao Jin, YueFeng Chen, Chenliang Xu
ACM'MM, 2022
paper / video / code

We apply channel-wise & spatial-wise feature for Deepfake Detection.

Adaptive Face Forgery Detection in Cross Domain
Luchuan Song, Zheng Fang, Xiaodan Li, Xiaoyi Dong, Zhenchao Jin, Yuefeng Chen, Siwei Lyu
ECCV, 2022
paper / code

We propose an case-adaptive softmax representation to solve the distribution fixation problem in Deepfake Detection.

You Should Look at All Objects
Zhenchao Jin, Dongdong Yu, Luchuan Song, Zehuan Yuan, Lequan Yu
ECCV, 2022
paper / code

We address the detection performance of large-scale objects are usually suppressed after introducing FPN.

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
Jingqun Tang, Wenming Qian, Luchuan Song, Xiena Dong, Lan Li, Xiang Bai
ECCV, 2022
paper

We attach reinforcement learning (DQN) to adjust the bbox in OCR.

TACR-NET: Editing on Deep Video and Voice Portraits
Luchuan Song, Bin Liu, Guojun Yin, Xiaoyi Dong, Yufei Zhang, Jiaxuan Bai
ACM'MM, 2021
paper / video

We not only edit the appearance of the talking head, but also the voice. And we explored the relationship between the transfered voice features and lip-sync.

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu
CVPR, 2021 (oral)
paper / project page / code / video / dataset / challenge

ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9M images, 221,247 videos), manipulations and annotations.




Teaching

CSC 245/445: Deep Learning - Fall 2024 & Fall 2023

This course covers much of the recent advances in machine learning and artificial intelligence have been dominated by neural network approaches broadly described as deep learning.

CSC 249/449: Machine Vision - Spring 2024

Fundamentals of computer vision, including image formation, elements of human vision, low-level image processing, and pattern recognition techniques.

Information Theory (B) - Fall 2018

This course mainly introduces the theories of source coding, channel coding and rate distortion. Other concepts, such as asymptotic equipartition property, entropy rate, and differential entropy, are also introduced.



Wonderful template from Jon Barron.