Very recently, triggered by a conversation about how to optimize a machine learning application in the context of 5G and network-based edge computing, one of my colleagues pointed me to a 2017 paper about collaborative intelligence between cloud and mobile edge, authored by Yiping Kang et al. http://web.eecs.umich.edu/~jahausw/publications/kang2017neurosurgeon.pdf

That paper analyses computation partitioning strategies between mobile device (referred to as mobile edge) and the cloud for the sake of achieving low latency, low energy consumption on device and high data center throughput. In short, if we have an application, in particular a deep neural network (DNN), shall we run it completely in the cloud, entirely on a mobile device or in a distributed way, both on device and in the cloud? The paper doesn't look into leveraging network-based cloudlets, however, I read it in the context of network-based cloudlets.

The authors partitioned a DNN application of N layers such that M consecutive layers would be run on the mobile device while N-M layers would be run in the cloud. The output of the Mth layer is sent uplink across a radio interface from the mobile device to the cloud. Their key findings suggest that a DNN layer-level computation partitioning can lead to better results, as the individual DNN layers have different characteristics regarding their data output (e.g. how much data is sent to the next DNN layer) and computational need (how much processing happens in a specific neural network layer). For example, in the case of AlexNet for image classification, the optimal split of the DNN appears to imply running all early layers up to the pooling layer “pool5” on the mobile device side, while running the remaining layers on the cloud side. The team created a scheduler called “Neurosurgeon” to partition such DNN computation between mobile device and data center on demand. For simplicity, the same DNN application code is deployed on the mobile device and in the cloud, although only a subset of the layers might be actually used on each side.

The analysis reminded me of the concept of edge-native applications for the following reasons:

In the terminology of the paper [1], an application that has its primary execution site on device (Tier 3) and edge cloud (Tier 2) is called an edge-native application. The same paper states that adaptive application behavior needs to be an integral part of what it means to be an edge-native application.

Here my thinking: Instead of partitioning a DNN application between mobile device and cloud, one could partition the application also between mobile device and edge cloud. In addition, the combination of DNN application code and partitioning scheduler can actually achieve adaptive application behavior (as shown by the Neurosurgeon paper).

The latter is the case because during application run-time, the scheduler (partitioning engine) selects the best partition point (how to split the DNN execution on the granularity of neural network layers) based  on various dynamic factors including for example measured available wireless uplink bandwidth from device to cloud(let) or variation in the server load  on the cloud(let) side (which might be quite relevant for a micro datacenter). Imagine, a stream of mobile video camera images is sent to the DNN application. Then, depending e.g. on the wireless network situation and server load over time, the number of DNN layers executed on the mobile device side or cloud(let) side would vary over time through automated adjustments.

Overall, the Neurosurgeon paper provides some interesting results. Although for AlexNet in the field of computer vision a particular partitioning after M layers appears optimal (for both minimal energy consumption and latency), the result for other types of DNN appears to be more binary and the partitioning turns out to be extreme, at least for the cases and circumstances investigated: either run the entire DNN on the mobile device or on the cloud(let) side, but don’t split the DNN somewhere in the middle for distributed computing, as that could worsen the performance.

Thus, from the investigated 8 different types of DNN,  AlexNet seems to benefit from ‘true partitioning’ with M layers executed on mobile device and N-M layers on cloud(let) side. However, in that case, the latency speedup shrinks with a faster wireless network: Although the benefit is rather pronounced for a 3G cellular network, it is lower for an LTE network. It would be interesting to see what latency speedup remains in case of a 5G radio access network.

My takeaways from this were:

An AlexNet-powered Deep Neural Network application, that can be run optimally partitioned between mobile LTE device and cloudlet could be a further example for an edge-native application that has potential to improve performance (latency and energy consumption). However, my initial hypothesis is that the latency benefit from such partitioning might fade to some extent when moving from an LTE network to a 5G network, whilst the energy consumption benefit might still be considerable.

Other application-level solution architectures (e.g. combination of 'small' and 'big' neural networks whereby each neural network itself is not executed in a distributed way, but at a different location like device edge and network edge) may offer alternative or better ways to achieve the same ultimate goal: very fast and robust overall decision making with low energy consumption.

Finally, to achieve adaptive application behavior, real-time insight or predictions regarding the conditions of (cellular) networks and cloud(let) server load could be instrumental. Infrastructure Monitoring brought together across networking and edge cloud computing?

[1] Mahadev Satyanarayanan et al: “The Seminal Role of Edge-Native Applications”, Proceedings of IEEE Edge 2019, Milan, Italy, July 2019.