FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings

This content originally appeared on HackerNoon and was authored by Kinetograph: The Video Editing Technology Publication

:::info (1) Feng Liang, The University of Texas at Austin and Work partially done during an internship at Meta GenAI (Email: jeffliang@utexas.edu);

(2) Bichen Wu, Meta GenAI and Corresponding author;

(3) Jialiang Wang, Meta GenAI;

(4) Licheng Yu, Meta GenAI;

(5) Kunpeng Li, Meta GenAI;

(6) Yinan Zhao, Meta GenAI;

(7) Ishan Misra, Meta GenAI;

(8) Jia-Bin Huang, Meta GenAI;

(9) Peizhao Zhang, Meta GenAI (Email: stzpz@meta.com);

(10) Peter Vajda, Meta GenAI (Email: vajdap@meta.com);

(11) Diana Marculescu, The University of Texas at Austin (Email: dianam@utexas.edu).

:::

Table of Links

5. Experiments

5.1. Settings

Implementation Details We train our model with 100k videos from Shutterstock [1]. For each training video, we sequentially sample 16 frames with interval {2,4,8}, which represent videos lasting {1,2,4} seconds (taking videos with FPS of 30). The resolution of all images, including input frames, spatial condition images, and flow warped frames, is set to 512×512 via center crop. We train the model with a batch size of 1 per GPU and a total batch size of 8 with 8 GPUs. We employ AdamW optimizer [28] with a learning rate of 1e-5 for 100k iterations. As detailed in our method, we train the major U-Net and ControlNet U-Net joint branches with v-parameterization [41]. The training takes four days on one 8-A100-80G node.

\ During generation, we first generate keyframes with our trained model and then use an off-the-shelf frame interpolation model, such as RIFE [21], to generate non-key frames. By default, we produce 16 key frames at an interval of 4, corresponding to a 2-second clip at 8 FPS. Then, we use RIFE to interpolate the results to 32 FPS. We employ classifier-free guidance [15] with a scale of 7.5 and use 20 inference sampling steps. Additionally, the Zero SNR noise scheduler [27] is utilized. We also fuse the self-attention features obtained during the DDIM inversion of corresponding key frames from the input video, following FateZero [35]. We evaluate our FlowVid with two different spatial conditions: canny edge maps [5] and depth maps [38]. A comparison of these controls can be found in Section 5.4.

\ Evaluation We select the 25 object-centric videos from the public DAVIS dataset [34], covering humans, animals, etc We manually design 115 prompts for these videos, spanning from stylization to object swap. Besides, we also collect 50 Shutterstock videos [1] with 200 designed prompts. We conduct both qualitative (see Section 5.2) and quantitative comparisons (see Section 5.3) with state-of-the-art methods including Rerender [49], CoDeF [32] and TokenFlow [13]. We use their official codes with the default settings.

:::info This paper is available on arxiv under CC 4.0 license.

:::

This content originally appeared on HackerNoon and was authored by Kinetograph: The Video Editing Technology Publication

Print Share Comment Cite Upload Translate Updates

APA

Kinetograph: The Video Editing Technology Publication | Sciencx (2024-10-09T12:15:16+00:00) FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings. Retrieved from https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/

MLA

" » FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings." Kinetograph: The Video Editing Technology Publication | Sciencx - Wednesday October 9, 2024, https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/

HARVARD

Kinetograph: The Video Editing Technology Publication | Sciencx Wednesday October 9, 2024 » FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings., viewed ,<https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/>

VANCOUVER

Kinetograph: The Video Editing Technology Publication | Sciencx - » FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/

CHICAGO

" » FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings." Kinetograph: The Video Editing Technology Publication | Sciencx - Accessed . https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/

IEEE

" » FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings." Kinetograph: The Video Editing Technology Publication | Sciencx [Online]. Available: https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/. [Accessed: ]

rf:citation

» FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis: Settings | Kinetograph: The Video Editing Technology Publication | Sciencx | https://www.scien.cx/2024/10/09/flowvid-taming-imperfect-optical-flows-for-consistent-video-to-video-synthesis-settings/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

5. Experiments

5.1. Settings

Related Posts