Blog: Seeing Further Down the Visual Cloud Road

Written by Jim Blakley | December 4, 2019
(Note: This Blog is a republication of my final blog while at Intel.)

Almost three years ago, Carnegie Mellon University Prof. Dave Andersen and I announced the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS) at the 2016 NAB Show . Along with Prof. Kayvon Fatahalian at Stanford, Dave has led the center and collaborated closely with other academic and Intel Labs researchers to push the boundaries in visual cloud systems. We set out to study and find solutions for some of the key problems with gathering, storing and analyzing video data in large scale distributed environments. With the completion of the center now drawing near, it’s time to take stock of the results and to talk of work yet to be done.

The center set out to look at four primary visual cloud challenges:

How can visual data processing and analytics be scaled out across many compute nodes?
How can visual data be stored, managed and accessed in an integrated way?
How can distributed sensors, edge nodes and cloud data centers efficiently collaborate on visual analytics workflows?
How can visual information be queried to gain insights about the content itself?

The center’s approach has been to bring together systems researchers and computer vision, AI and graphics researchers to create prototype systems that allow investigation of these topics. The prototype systems have been integrated in an end-to-end reference architecture and exercised in an urban testbed in Pittsburgh where we’ve focused on the problems of smart city video applications. With this approach, we were able to understand more fully the challenges of real world applications and create, as a byproduct, code that others can use to continue the experiments. I’ve talked about many of these efforts in previous blogs. This blog is a summary of the accomplishments, the key learnings and some of the big challenges remaining.

What did we make? First, the concrete stuff. We produced four open source systems platforms, one for each of the research vectors:

Scanner – to scale out video processing and analytics workloads across large numbers of compute nodes
The Visual Data Management System (VDMS) – to efficiently integrate the storage and retrieval of visual information and metadata associated with that visual data
The Streaming Analytics Framework (SAF) – to connect cameras, edge nodes and clouds together for distributed end-to-end video processing
Eureka – to enable the search for machine learning training data on data distributed at the edge

In addition, some other valuable systems were created including Esper – an application framework for video analytics built on Scanner, OpenRTIST – a generative graphics application that shows the value of edge computing for user interactive visual experiences and Rekall is a python library for programmatically specifying complex events of interest in video.

We published many academic papers (listed below) including innovative publications in multitenant video processing at the edge and distributed neural network training. To make the work tangible, we demonstrated many use cases including 360 degree video creation, volumetric rendering, drone video analytics, video summarization, media dataset analysis, medical imaging, retail shopper tracking and traffic monitoring. For example, for CVPR 2019 , Intel researcher Pablo Munoz led the creation of a 3D pose estimation demonstration using Scanner to improve visual quality for sports volumetric rendering. For IBC 2019 , Intel researchers Chaunte Lacewell, Ragaad Altarawneh, Pablo Munoz, Luis Remis also using Scanner, VDMS, Gstreamer led the creation of a video summarization demonstration.

And, two new startups have been formed using the work of the center, ApertureData focused on VDMS founded by former Intel Labs researchers Vishakha Gupta and Luis Remis and a new company focused on edge analytics founded by CMU Prof. Dave Andersen and former Intel researcher Michael Kaminsky.

But, more importantly, what did we learn?

There have been many learnings from the center but here are my top seven:

Multitenancy and heterogeneity at the edge – Just as virtualized data centers and clouds needed to protect co-resident applications from noisy and nosy neighbors, shared edge networks will need to build-in multitenancy. A blind pedestrian relying on real time intersection navigation mustn’t be impacted by a nearby group of augmented reality gamers sharing the same edge node. Video data stored at the edge for, say, law enforcement purposes must be inaccessible from applications that citizens use to check traffic at a specific intersection.

Similarly, with the higher costs of deploying resources to edges, it is an economic necessity to provide and share CPU, GPU, FPGA, network and storage across multiple edge tenants. An operator can’t afford to allocate, say, a full high-end GPU to a single user for the duration of an augmented reality game.

Bandwidth cost and capacity drives architecture – In the abstract, it’s tempting to say “just stream it all to the cloud”. In practice, real world connectivity limits that approach. The consumer and commercial Internet often has upload speeds much lower than download and cameras are mostly about upload. Even if massive upload is available, it comes at a price. Yet, distributed video analytics applications often require high resolution content to perform well. This means placing some analytics at the edge to intelligently determine what to send to the cloud. Balancing this edge compute cost against network cost and capacity will be a primary engineering challenge of edge infrastructure.
Video analytics aren’t the same as data analytics – The big data revolution includes the video analytics revolution but the tools, algorithms and skills needed to analyze more traditional data are not the same as those needed to work with visual data. Yes, both types of workloads make heavy use of multi-dimensional matrix multiplies and running a neural network on a pixel array bears a passing resemblance to, say, a security pricing algorithm. But, using Apache Spark or Pandas, designed for table, string and numeric data types, to build a computer vision application using video, image and pixel data types is the wrong approach for the job. A system like Scanner complements a system like Spark by treating visual data types as first class citizens.
Visual applications are multimodal – The flip side of the previous learning is that visual data rarely lives alone in an application. It is almost always joined with metadata associated with the visual data – metadata like date, time, duration, licenses or tags of objects in the video. It will often be joined with other time synchronized sensor data like audio, lidar, altitude, or GPS position. And, it may be analyzed in the context of non-synchronous but related information like traffic signal timings or previous 24 hour snowfall data at an intersection monitored by a traffic camera.
Edge to cloud workload and data distribution is an art, not a science – With the broad diversity of infrastructure capabilities and application needs, there are few easy methods and even fewer tools to guide application developers in dividing application execution and data between intelligent end devices, edge infrastructure and cloud data centers. Distribution is therefore highly dependent on developer knowledge, intuition and ease of development and that distribution tends to be static with little adjustment to varying conditions like network load or device capability.
Perpetual training with a human-in-the-loop is our near future – Much early artificial intelligence research has focused on algorithms and network design. However, as large scale system applications emerge, teaching them how to respond to novel and unanticipated situations – like construction of a new building or a new need to differentiate buses from trucks – becomes a key requirement. With today’s technologies, these requirements often require retraining the AI solution for the new situation. And that usually requires a lengthy and labor intensive manual collection of an enhanced training data set. The cycle time from need to solution is gated by that manual effort.
There is no algorithm holy grail – the system designer should view the panoply of neural network, machine learning, computer vision, image processing and data analytics algorithms as a design palette. Algorithms are suited to different application requirements and available resources. An algorithm that is best suited for a power-constrained smartphone may be bettered by another algorithm when the application executes in the cloud. Algorithm innovation also continues unabated, so today’s best algorithm may be replaced by another tomorrow. And, using multiple algorithms together to, say, allow a low cost algorithm to triage content for a high cost algorithm can yield efficiency gains.

“The more you know, the more you know you don't know” ― Aristotle

After three years, we certainly know a lot more than we did at the beginning. But, all the areas above are rich for further research. Look for a future blog that outlines Intel Labs’ on-going research agenda for visual cloud systems. To foreshadow that agenda, you can be pretty sure it will concentrate on the above issues as well as on newer areas like applying AI techniques to media and graphics pipelines.

Conclusion

Intel Science and Technology Centers are intended to foster collaborations and development of communities among Intel and academia and the ISTC-VCS has role modeled that intent. We’d like to see that collaboration continue. Please reach out to any of the involved faculty for more information on their work. For more on Intel’s work in the center and our future research agenda in this domain, please contact Intel Fellow, Ravishankar Iyer . See below for most of the public publications, presentations and code from the center.

A wrap up of the ISTC-VCS would not be complete without acknowledgement our friend and colleague, Intel researcher Scott Hahn who passed away suddenly in June 2018. Scott was instrumental in the formation of the center and led Intel’s involvement until his death. He is deeply missed and he would have been proud of all we accomplished.

References

For Droid Eyes Only – CANVAS (November 2019)
Overcoming Visual Analysis Paralysis -- Scanner, Spark, VDMS, Pandas and Apache Arrow (Oct 2019)
Sharing the Video Edge -- Mainstream and FilterForward (Apr 2019)
Feeling a Little Edgy -- OpenRTIST -- (Mar 2019)
The Loneliness of the Expert -- Eureka -- (Mar 2019)
Visual Data: Pack Rat to Explorer -- VDMS --(Feb 2019)
Scaling the Big Video Data Mountain Scanner --(Jan 2019)

Scanner

" Scanner: Efficient Video Analysis at Scale ”, Alex Poms, Will Crichton, Pat Hanrahan, Kayvon Fatahalian, SIGGRAPH 2018.
" Scanner: Efficient Video Analysis at Scale ", Alex Poms, Will Crichton, Strata Data Conference 2019.
CODE: http://scanner.run/ and https://github.com/scanner-research/scanner

Visual Data Management System

University of Washington Seminar
“ VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads ", Luis Remis, Vishakha Gupta-Cledat, Christina Strong, Ragaad Altarawneh, NeurIPS 2018.
CODE: https://github.com/IntelLabs/vdms

Edge to Cloud Video Analytics

" Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing " Angela Jiang, Daniel L.-K. Wong, Christopher Canel, Ishan Misra, Michael Kaminsky, Michael A. Kozuch, Padmanabhan Pillai, David G. Andersen, Gregory R. Ganger. In 2018 USENIX Annual Technical Conference (USENIX ATC'18), Boston, MA, July 2018
FilterForward: " Scaling Video Analytics on Constrained Edge Nodes ", Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor, (SysML, Palo Alto, CA, 2019)
CODE: Streaming Analytics Framework
“Navigating the Visual Fog: Analyzing and Managing Visual Data from Edge to Cloud” , Ragaad Altarawneh, Christina Strong, Luis Remis, Pablo Muñoz, Addicam Sanjay, Srikanth Kambhatla
“ Towards Drone-sourced Live Video Analytics for the Construction Industry ” Authors: Shilpa George (CMU), Junjue Wang (CMU), Mihir Bala (U of Michigan), Thomas Eiszler (CMU), Padmanabhan Pillai (Intel Labs), Mahadev Satyanarayanan (CMU), Proceedings of the 4th ACM/IEEE Symposium on Edge Computing (SEC '19), 2019.
“ Bandwidth-Efficient Live Video Analytics for Drones Via Edge Computing ”, Junjue Wang, Ziqiang Feng, Zhuo Chen, Shilpa George, Mihir Bala, Padmanabhan Pillai, Shao-Wen Yang, and Mahadev Satyanarayanan, 2018 IEEE/ACM Symposium on Edge Computing (SEC), Seattle, WA, 2018.
“ Focus: Querying Large Video Datasets with Low Latency and Low Cost ”, Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu, 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018.
“ Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels ”, Daniel Y Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, and Kayvon Fatahalian, arXiv preprint arXiv:1910.02993, October 7, 2019.
“ Cloudlet-Based Just-in-Time Indexing of IoT Video ”, M. Satyanarayanan, P. B. Gibbons, L. Mummert, P. Pillai, P. Simoens and R. Sukthankar, 2017 Global Internet of Things Summit (GioTS), 2017.
OpenRTIST (edge augmented reality application) on YouTube -- CODE: https://github.com/cmusatyalab/openrtist and Android Client App
“ An Empirical Study of Latency in an Emerging Class of Edge Computing Applications for Wearable Cognitive Assistance ”, Zhuo Chen, Wenlu Hu, Junjue Wang, Siyan Zhao, Brandon Amos, Guanhang Wu, Kiryong Ha, Khalid, Elgazzar, Padmanabhan Pillai, Roberta Klatzky, Daniel Siewiorek, and Mahadev Satyanarayanan, Proceedings of the Second ACM/IEEE Symposium on Edge Computing (SEC '17), 2017.
“ Enabling Live Video Analytics with a Scalable and Privacy-Aware Framework ”, Junjue Wang, Brandon Amos, Anupam Das, Padmanabhan Pillai, Norman Sadeh, and Mahadev Satyanarayanan, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) – Special Section on Delay-Sensitive Video Computing in the Cloud and Special Section on Extended MMSysNOSSDAV Best Papers, Vol. 14, No. 3s, Article 64, June 2018.
“ A Scalable and Privacy-Aware IoT Service for Live Video Analytics ”, Junjue Wang, Brandon Amos, Anupam Das, Padmanabhan Pillai, Norman Sadeh, and Mahadev Satyanarayanan, Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys'17), 2017.
“ Live Synthesis of Vehicle-Sourced Data Over 4G LTE ”, Wenlu Hu, Ziqiang Feng, Zhuo Chen, Jan Harkes, Padmanabhan Pillai, and Mahadev Satyanarayanan, Proceedings of the 20th ACM International Conference on Modelling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM '17). 2017.
“ Picking Interesting Frames in Streaming Video ”, Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G Andersen, Michael Kaminsky, and Subramanya R Dulloor, 2018 SysML Conference, 2018.
" Focus: Querying Large Video Datasets with Low Latency and Low Cost ", Kevin Hsieh, Carnegie Mellon University; Ganesh Ananthanarayanan and Peter Bodik, Microsoft; Shivaram Venkataraman, Microsoft / UW-Madison; Paramvir Bahl and Matthai Philipose, Microsoft; Phillip B. Gibbons, Carnegie Mellon University; Onur Mutlu, ETH Zurich, 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18), October 8–10, 2018 • Carlsbad, CA, USA
“ Towards a Distraction-free Waze ”, Kevin Christensen, Christoph Mertz, Padmanabhan Pillai, Martial Hebert, and Mahadev Satyanarayanan, Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, (HotMobile '19), 2019.
“ Towards Scalable Edge-Native Applications ”, Junjue Wang, Ziqiang Feng, Shilpa George, Roger Iyengar, Padmanabhan Pillai, and Mahadev Satyanarayanan, Proceedings of the 4th ACM/IEEE Symposium on Edge Computing (SEC '19), 2019.
“ What Actions Are Needed for Understanding Human Actions in Videos?”, Gunnar A. Sigurdsson, Olga Russakovsky, and Abhinav Gupta, The IEEE International Conference on Computer Vision (ICCV), 2017.
“ Assisting Users in a World Full of Cameras: A Privacy-Aware Infrastructure for Computer Vision Applications ”, A. Das, M. Degeling, X. Wang, J. Wang, N. Sadeh and M. Satyanarayanan, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.
“ Augmenting Cognition Through Edge Computing ”, M. Satyanarayanan and N. Davies, Computer, Vol. 52, No. 7, July 2019.
“ Openface: A General-Purpose Face Recognition Library with Mobile Applications ”, B. Amos, B. Ludwiczuk, and M. Satyanarayanan, Technical report, CMU, 2016.
“ The Seminal Role of Edge-Native Applications ”, M. Satyanarayanan, G. Klas, M. Silva and S. Mangiante, 2019 IEEE International Conference on Edge Computing (EDGE), 2019.
“ Technological Framework for Edge-to-Cloud Complex Visual Analytics Applications ”, J. Pablo Muñoz, and Luis Remis. The Hawaii International Conference on System Sciences 2019.

Data Sets and Training for Video Analytics

" Edge-based Discovery of Training Data for Machine Learning " Ziqiang Feng, Shilpa George, Jan Harkes, Padmanabhan Pillai, Roberta Klatzky, Mahadev Satyanarayanan, 2018 Third ACM/IEEE Symposium on Edge Computing and IEEE Internet Computing, vol. 23, no. 4, pp. 35-42, 1 July-Aug. 2019 -- CODE: https://github.com/fzqneo/eureka-yfcc100m
“ EVA: An Efficient System for Exploratory Video Analysis ”, Ziqiang Feng, Junjue Wang, Jan Harkes, Padmanabhan Pillai, and Mahadev Satyanarayanan, SysML 2018
“ Accelerating Deep Learning by Focusing on the Biggest Losers ”, Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai, Published to ArXiv, October 2019
“ EdgeDroid: An Experimental Approach to Benchmarking Human-in-the-Loop Applications ”, Manuel Osvaldo J. Olguín Muñoz, Junjue Wang, Mahadev Satyanarayanan, and James Gross, Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, (HotMobile '19), 2019.
“ MLtuner: System Support for Automatic Machine Learning Tuning ”, Henggang Cui, Gregory R Ganger, and Phillip B Gibbons, March 20, 2018.

Computer Vision and Machine Learning Algorithms

“ HydraNets: Specialized Dynamic Architectures for Efficient Inference ”, Ravi Teja Mullapudi, William R. Mark, Noam Shazeer, and Kayvon Fatahalian, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
“ 3D Human Pose Estimation = 2D Pose Estimation + Matching ”, Ching-Hang Chen and Deva Ramanan, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
“ ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification ”, Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, and Bryan Russell, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
“ Actor and Observer: Joint Modeling of First and Third-Person Videos ”, Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
“ Asynchronous Temporal Fields for Action Recognition ”, Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, and Abhinav Gupta, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
“ Attentional Pooling for Action Recognition ”, Rohit Girdhar and Deva Ramanan, Advances in Neural Information Processing Systems (NIPS 2017), 2017.
“ Brute-Force Facial Landmark Analysis With a 140,000-Way Classifier ”, Mengtian Li, Laszlo Jeni, and Deva Ramanan, Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), 2018.
“ Depth-Based Hand Pose Estimation: Methods, Data, and Challenges ”, James Steven Supančič, Gregory Rogez, Yi Yang, Jamie Shotton, and Deva Ramanan, International Journal of Computer Vision, Vol. 126, No. 11, November 2018.
“ Proteus: Agile ML Elasticity through Tiered Reliability in Dynamic Resource Markets ”, Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, and Phillip B. Gibbons, Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17), 2017.
“ From Images to 3D Shape Attribute ”, D. F. Fouhey, A. Gupta and A. Zisserman, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, January 2019.
“ Hardware Conditioned Policies for Multi-Robot Transfer Learning ”, Chen, Tao and Murali, Adithyavairavan and Gupta, Abhinav, Advances in Neural Information Processing Systems 31, 2018.
“ The Pose Knows: Video Forecasting by Generating Pose Futures ”, Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert, The IEEE International Conference on Computer Vision (ICCV), 2017.
“ Finding Tiny Faces ”, Siva Chaitanya Mynepalli, Peiyun Hu, Deva Ramanan, The IEEE International Conference on Computer Vision (ICCV), 2019.
“ Spatial Memory for Context Reasoning in Object Detection ”, Xinlei Chen and Abhinav Gupta, The IEEE International Conference on Computer Vision (ICCV), 2017.
“ Submodular Trajectory Optimization for Aerial 3D Scanning ”, Mike Roberts, Debadeepta Dey, Anh Truong, Sudipta Sinha, Shital Shah, Ashish Kapoor, Pat Hanrahan, and Neel Joshi, The IEEE International Conference on Computer Vision (ICCV), 2017.
“ Towards Segmenting Anything That Moves ”, Achal Dave, Pavel Tokmakov, and Deva Ramanan, The IEEE International Conference on Computer Vision (ICCV), 2019.
“ Videos as Space-Time Region Graphs ”, Xiaolong Wang and Abhinav Gupta, The European Conference on Computer Vision (ECCV), 2018.
“ Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs ”, Xiaolong Wang, Yufei Ye, and Abhinav Gupta, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.