What can you do with CoreML?

CoreML is Apple’s on-device machine learning framework. It lets an iOS, macOS, watchOS, or tvOS application run a trained model locally — on the CPU, GPU, or the Neural Engine — without sending data to a server. The framework itself is narrow: it loads a .mlmodel (or compiled .mlmodelc) file, exposes a typed prediction API, and dispatches the computation to whichever compute unit it judges fastest.

The interesting question for engineers is rarely “what can CoreML do?” in the abstract. It is: which models convert cleanly, which silently degrade in the conversion step, and where CoreML fits when the same model also has to run on Android or the web.

What CoreML actually runs

CoreML accepts models that have been converted into its own format, usually via coremltools. The conversion path most teams use today is PyTorch or TensorFlow → ONNX → CoreML, or PyTorch → CoreML directly through coremltools.convert. Once converted, the model is a self-contained file embedded in the app bundle.

At runtime, the framework decides where to execute the graph:

Neural Engine — fixed-function accelerator on A11 and later chips. Fastest and lowest power for supported operators.
GPU — used when the graph contains operators the Neural Engine does not support, or when batch shapes vary.
CPU — fallback for unsupported operators and for cold-start paths.

You do not pick the device manually in normal use. You hint at it with MLComputeUnits (.all, .cpuAndNeuralEngine, .cpuAndGPU, .cpuOnly) and CoreML routes from there.

The capability list, briefly

Task family	Typical model	CoreML helper framework
Image classification, detection, segmentation	ResNet, MobileNet, YOLO variants	Vision
Text classification, NER, language ID	BERT-small, DistilBERT, custom	Natural Language
Speech recognition / TTS	Whisper-tiny, Conformer, distilled TTS	none — direct CoreML
Tabular prediction	Tree ensembles, small MLPs	Create ML
Generative imagery	Stable Diffusion variants	none — direct CoreML

Vision and Natural Language are wrappers that hand a preprocessed input to a CoreML model. They are useful when the input is a CVPixelBuffer or a String; for custom pipelines you call the model directly.

Where conversion silently degrades

The honest part of any CoreML overview is the failure modes. Three are common in our experience across mobile ML engagements:

Operator coverage gaps. Some PyTorch ops (custom attention kernels, certain scatter/gather patterns, dynamic shape branches) have no direct CoreML equivalent. coremltools will either fall back to a slower path or refuse to convert. The model converts, but the Neural Engine route is disabled and the latency budget collapses to GPU or CPU.
Quantisation drift. CoreML supports float16, int8, and palettised weights. Quantising a TTS or speech model on-device often produces audible artefacts at the quality boundary — clicks, breathiness, or pitch jitter — that do not show up on standard image-classification accuracy metrics. This is an observed pattern across our cross-platform TTS deployments, not a benchmarked rate.
Dynamic-shape penalties. Variable input shapes (common in NLP and audio) force CoreML to recompile or fall back. Pinning a small set of fixed shapes at conversion time usually restores Neural Engine dispatch.

A useful rule: assume the converted model is a different model until you have measured it on the target device against the same inputs as the source model.

Where CoreML fits in a cross-platform plan

CoreML is iOS-only. If the same model has to ship on Android, ONNX Runtime is the usual counterpart; on the web, ONNX Runtime Web or WebGPU. The architectural choice — the one that determines whether you ship once or ship five times — is whether to distil a single small model that runs everywhere or to quantise per platform.

We documented the distillation path end-to-end in the TTS Inference Optimisation on Edge case study, including the CoreML and ONNX export pipeline and the on-device latency measurements that validated it. The trade-off is also covered in Cross-Platform TTS Inference on ONNX and CoreML.

For the mechanics of the conversion step itself — coremltools API, model surgery, and packaging — see A Gentle Introduction to CoreMLtools.

A short answer to the title

CoreML lets you run a converted machine learning model locally on Apple hardware, with automatic dispatch to the Neural Engine, GPU, or CPU. It does not solve the cross-platform deployment problem on its own; it is one runtime in a portability strategy that also has to account for ONNX Runtime on Android and the web. The work that matters is upstream of CoreML — conversion fidelity, operator coverage, and shape pinning — not in the framework’s API surface.

Image by Freepik