Blog: Sharing the Video Edge

Written by Jim Blakley (@jimblakley) | April 16, 2019

(This is a repost of my Intel blog on the work at the Intel Science and Technology Center for Visual Cloud Systems at CMU and partially funded by Intel)

You may recognize the images at below as Bell System photos from the turn of the 20 ^th century. They are real examples of what happens when a large system does not effectively share resources among its users. The infrastructure becomes complex, expensive, unwieldy and unmaintainable.

Multitenancy – designing the infrastructure for efficiently supporting multiple simultaneous users – is critical to make large systems practical. In the telephone network, automated switching and call multiplexing technologies enabled multitenancy. More recently, virtualization, containerization and cloud orchestration have enabled multitenancy for centralized cloud data centers. But, as we enter a new era of edge computing , multitenancy on distributed edge nodes brings new issues. Here at the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS) , we’re researching some of those challenges.

The ISTC-VCS was launched in 2016 as a collaborative effort between Intel, Carnegie Mellon University (CMU) and Stanford University to research systems issues in large scale deployments of visual computing solutions. We anchored the work around smart city applications that depend on many distributed cameras in urban areas.

A typical smart city won’t have just one application but many that need to use data from the same distributed cameras. The traffic management department may want to monitor congestion to reconfigure traffic signal timings, the public transit authority may want to monitor passenger queues to adjust bus dispatch schedules and the public safety office may want to study pedestrian behavior to see whether jaywalking and signal jumping warrant reconfiguring intersection layouts. As a sometimes urban bike commuter, I would love if the city could tell me which intersections are prone to near-misses between vehicles and bicycles. I could then reroute to avoid those intersections.

But, if every department gets its own cameras and supporting edge infrastructure, street corners begin to look like the picture at left. Familiar? The ISTC-VCS is researching technologies to share cameras and edge nodes among applications. This problem is complex and multi-dimensional and we have chosen to focus on efficiently sharing edge node computing and network resources for computer vision-driven applications.

At CMU in Pittsburgh, we have built an urban deployment to test our research. Like many distributed video networks, our architecture looks like Figure 1.

Our edge nodes are mounted on utility poles some 20 feet above the street. Each node contains an Intel® NUC Kit NUC6i7KYK (Skull Canyon) with dedicated storage and connects to a single high definition computer vision camera. (More about the testbed in a future blog.) The critical architectural constraints are the network bandwidths from camera to edge node and edge node to cloud. While each camera’s 1 Gbps connection to the edge node imposes some constraints, the bigger issue is that up to 8 edge nodes at a single intersection share a 35 Mbps commercial business internet connection to the backend data center. That service costs hundreds of dollars a month and higher upload speeds are often not available in distributed street corner environments. For context, a single 4K/60fps/10 bit lossless JPEG2000 encoded video would require 2-3 Gbps. Visually lossless JPEG2000 would still be 200-300 Mbps. Typical computer vision cameras provide at best an H.264 encoded stream at 30-40 Mbps.

This limitation means that it is not possible to continuously stream all data from all cameras at a fidelity sufficient for many smart city applications. The camera and edge node must decide what information is processed at the edge and what gets sent to the central data center for further analysis. A common approach has been to use cameras or edge nodes to compress using a lossy encoder like H.264 or H.265. So, that 4K/60fps/10 bit lossless stream becomes a 1080p/30fps H.265-encoded stream at 3-6 Mbps. This method would allow us to stream our 8 camera feeds to the central data center – barely.

Many modern applications, like the bus passenger queue counter, require high-fidelity images. Compressing to 3-6 Mbps to upload the entire stream reduces fidelity so much that these applications cannot function. Our work has focused on using edge computing to efficiently process high-fidelity images near their source and intelligently transmit only the useful information to the backend – and to do so for multiple applications running concurrently. The queue-counting app can access a high-fidelity view of the bus stop and the bicycle/vehicle near-miss app can view the bike lane while the system optimizes constrained edge nodes and network bandwidth. Two ISTC-VCS projects – Mainstream and FilterForward – take complementary approaches to this problem.

Mainstream --- developed by CMU students Angela Jiang, Daniel L.K. Wong, Christopher Canel, Lilia Tang, and Ishan Misra under the direction of CMU professors Dave Andersen and Greg Ganger and Intel’s Michael Kaminsky, Mike Kozuch, and Padmanabhan (Babu) Pillai -- uses transfer learning techniques to optimize deep neural network (DNN) inference runtime efficiency on multitenant edge nodes. Transfer learning has mainly been used to improve the time and resources spent on neural network training, but Mainstream applies it to runtime resource optimization on edge nodes. Mainstream starts from the premise that edge node applications can use the common trunk of a deep neural network – say, MobileNet trained with the ImageNet dataset – but retrain the last layers of the network to optimize for that specific application. So, for example, the bike detector would train the last few layers using example frames that contain bicycles in different regions of the intersection.

As shown in Figure 2, in a multitenant runtime environment with N inferencing applications running, the main trunk of the neural net is executed only once per frame. Feature vectors from an optimal trunk layer are then run through each application-specific detector to classify objects for that application. Mainstream takes a unique runtime approach to optimize the tradeoffs between edge node resources and the accuracy of each application. The team showed (Figure 3) that on a given edge node running multiple instances of an event detection application, Mainstream could maintain the accuracy (event F1 score) while running multiple instances of the application.

When Mainstream uses a common neural net for all applications (the blue “Max Sharing” line), the F1 score was much lower because there was no application-specific customization. When each application uses its own dedicated neural network (the green “No Sharing” line), the F1 score drops off quickly with the number of applications because the applications must reduce framerate to keep up.

FilterForward complements the work of Mainstream to further optimize multitenant edge node resources and network bandwidth. Developed by CMU’s Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim and Dave Andersen and Intel’s Michael Kaminsky and Subramanya R. Dulloor, FilterForward uses inexpensive “microclassifiers” to find and forward relevant high resolution video snippets from the edge node to the data center. Microclassifiers are highly efficient, application-specific event detectors that, like Mainstream’s application-specific detectors, draw from the feature vectors produced at each layer of a common “base DNN” run on every video frame at the edge node. A microclassifier can be developed to detect events across the full frame or within specific regions of the frame. This specificity allows the microclassifier to be highly efficient and run in parallel with other microclassifiers on the same edge node. For example, a jaywalker microclassifier could detect pedestrians stepping off the curb in the middle of the block – in, say, a specific 512x512 region of a 3840 x 2160 frame. A car-bicycle microclassifier could detect frames with both a bicycle and a car anywhere in the frame and forward a full 4K video snippet of the bicycle entering and leaving the frame. Both the jaywalker and bicycle/car detectors can run simultaneously on the same edge node, but the heavy lifting – analyzing jaywalker behavior or identifying true near accidents – is done at the data center with scalable cloud resources (see my Scanner blog).

Figure 4 shows the accuracy vs. network bandwidth results for FilterForward as compared with compressing and sending the full video stream at different bit rates – better event detection accuracy and much lower bandwidth! FilterForward also increases the number of simultaneous applications that can be run on an edge node. Figure 5 shows the decline in system throughput as the number of applications increases for three types of FilterForward microclassifiers (blue, red, violet), a NoScope -like discrete classifier (yellow) and running a per-application MobileNet DNN (brown). The FilterForward microclassifiers show much better scaling as the number of applications increases.

Mainstream and FilterForward are only a small part of the edge multitenancy solution. They point a direction for resource sharing for distributed video analytics applications when complex algorithms are used to improve individual application success while staying within the constraints of space, bandwidth, power and cost of a distributed edge node network.

Mainstream was first presented at USENIX ATC ’18 in July 2018. FilterForward just debuted at SysML 2019 this month. FilterForward Code is available here for you to take out for a spin. Both projects are based on the Streaming Analytics Framework (SAF) for real-time video ingestion and analytics, primarily of live camera streams, with a focus on running state-of-the-art machine learning/computer vision algorithms. Please try out both FilterForward and SAF and stay tuned for a future blog on SAF. For more information, see the references below and follow me @jimblakley .

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

" Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing " USENIX ATC ‘18

FilterForward: “ Scaling Video Analytics on Constrained Edge Nodes ” SysML 2019

“ Scaling the Big Video Data Mountain ”

“ Visual Data: Pack Rat to Explorer ”

“ The Loneliness of the Expert ”