This work improves upon the original RAGCol paper by expanding the size and quality of COL-KG, adding efficiencies to the automatic video colorizer and incorporating semantic similarity search as opposed to the original Cypher query-based search.
Deciding which colors to use for a grayscale film without any guidance can lead to a myriad of colorization outputs, with some more believable than others. Unlike cinema, the lighter burden of colorizing photos allows human text guidance to be used. This paper attempts to take advantage of recent advances in large language models and neuro-symbolic artificial intelligence (AI) to extend human-guided image colorization to the video domain. The instantiation of this methodology we call "RAGCol++: Retreival Augemented Generation based automatic video Colorization using Semantic Similarity Search and Probabilistic Grounded Knowledge". This work improves upon the original RAGCol paper by expanding the size and quality of COL-KG, adding efficiencies to the automatic video colorizer and incorporating semantic similarity search as opposed to the original Cypher query-based search. Our system achieves an average improvement of 3% over the previous state-of-the-art L-CAD + BVD across the DAVIS and Videvo datasets when using PSNR, SSIM, FID, FVD, and CDC as the metrics. This result is also substantiated by our user study, where RAGCol++ was preferred 56% of the time.