Skip to content

SYSTEM Cited by 1 source

MobileNet

MobileNet is Google's family of efficient CNN architectures designed for on-device (mobile / embedded) inference. The defining architectural choice is depthwise-separable convolutions — factoring a standard convolution into a depthwise stage (one filter per input channel) followed by a 1×1 pointwise convolution — which drops parameter count and multiply-add count by roughly an order of magnitude at comparable accuracy on image-classification tasks. Three major versions: v1 (2017, arXiv:1704.04861), v2 (2018, inverted residuals + linear bottlenecks), v3 (2019, NAS-discovered blocks + h-swish + squeeze-and-excitation).

Stub page. The sysdesign-wiki's current ingest on MobileNet is as a building block of a larger distillation pipeline (YouTube real-time generative AI effects). Per-version architectural detail, latency-vs-accuracy numbers, and quantisation / compilation behaviour are not in the raw. This page will expand as more sources that actually measure MobileNet-family behaviour are ingested.

Why the sysdesign-wiki cares about MobileNet

MobileNet exists because of an on-device inference constraint. Standard CNN architectures (VGG, ResNet) were designed on datacentre GPUs and don't fit mobile compute / memory / battery budgets. MobileNet is the canonical wiki instance of "architecture selected by the serving substrate, not by the training task" — the model class exists because phones exist, not because the vision task required it. This makes MobileNet recurring wiki vocabulary for on-device ML discussions:

  • Encoder backbone for image classification / detection / segmentation on mobile.
  • Student backbone in distillation pipelines targeting mobile inference.
  • Block primitive reused inside other architectures (e.g. MobileNet-block decoders in UNet-style image-to-image students on mobile).

Usage in YouTube's real-time generative AI effects

The 2025-08-21 post names MobileNet twice (Source: sources/2025-08-21-google-from-massive-models-to-mobile-magic-tech-behind-youtube-real-time-generative-ai):

  • As the encoder backbone of YouTube's on-device student model — "a design known for its performance on mobile devices".
  • As the block primitive for the student's decoder — "a decoder that utilizes MobileNet blocks".

Both are UNet components; MobileNet is the substrate choice inside UNet's encoder / decoder slots rather than a standalone model.

Reported metrics (from the ingested source set)

No MobileNet-specific latency / accuracy / parameter numbers are disclosed in the 2025-08-21 YouTube post. For canonical MobileNet benchmarks see the upstream v1/v2/v3 papers on arXiv.

Seen in

Last updated · 200 distilled / 1,178 read