An outerframe is like a web view, but native. It runs compiled machine code in a sandboxed process, and that code uses the underlying operating system’s APIs to create UI.
A fundamental reason web views are useful is that they let the user experience be driven by something external, while the web view serves as a safeguard. The outerframe keeps that host/content boundary, but replaces the conventional HTML/JavaScript UI runtime with platform-specific native code.
The key difference from the conventional web is that outerframe apps are multi-platform, not cross-platform. A macOS outerframe app is not the same artifact as a Linux outerframe app or a Windows outerframe app. Each platform can have its own outerframe implementation, and each implementation can lean into what that operating system is good at. This page describes the current macOS shape.
Contents
- Summary
- Rendering
- Content Startup
- Dealing with macOS directly
- Synchronous APIs
- Bring your own UI framework
- Text input and keypresses
- Copy/paste
- Drag and drop
- Temporary staging directory
- Colors
- Networking
- Accessibility / automation
- Socket messages
Summary
The outerframe is a hosted platform. Broadly, the host downloads a dynamically loaded library, loads it into a sandboxed process, embeds the rendered result into a rectangular region of its own UI, and exchanges messages with the content process for user interaction events and host-brokered APIs.
The sandboxed process renders the contents, but the host remains responsible for the boundary. It decides what the content can access directly, what must go through a proxy, and what must be mediated by native host UI.
On macOS, the outerframe content process gets:
- a sandboxed process,
- a socket connection to the host,
- a way to register a root
CALayer, - no direct network access, plus a restricted host proxy,
- host-mediated input, accessibility, navigation, and pasteboard messages,
- a host-provided temporary staging directory for file exchange.
Outerframe content on macOS is therefore not “a tiny AppKit app.” AppKit’s high-level UI objects are not designed to be safely embedded as visually-sandboxed content inside another app. Instead, content renders through lower-level primitives and asks the host to perform privileged native operations.
This API design is influenced by a few things specific to Apple’s AppKit:
- AppKit is not designed for creating visually-sandboxed UI (e.g. the content of a browser tab)
- AppKit relies on synchronous responses on the main thread (e.g. for setting whether copy/paste is available)
The first point means that macOS outerframe content is usually written in terms of lower-level primitives like layers, not views. The second point means some host APIs need synchronous content answers, such as whether Copy or Cut should be enabled.
The core outerframe API intentionally stays small. It does not try to be a complete UI framework. You can build or bring a framework on top of it, or you can render directly using the lower-level platform APIs.
Rendering
The philosophy of the outerframe is to lean on the operating system’s UI APIs. On macOS, the entrypoint to displaying content is the CALayer. Content creates a root layer and registers it with the host. From that layer tree, content can use Core Animation, Core Graphics, Core Text, Metal, AVFoundation, and similar low-level macOS frameworks, subject to the sandbox.
A single CALayer can be treated like a framebuffer or canvas, but layers can also form a tree. This lets content position elements above each other, use transparency, and update only part of the visual tree.
Not all UI frameworks are compatible with rendering in a sandboxed separate process. Neither SwiftUI nor AppKit’s NSView hierarchy is the model here. If you want a conventional “view” abstraction, hit-testing, focus management, selection, and accessibility, you build that model in content code and render the result through layers. The host sends input events; content decides what those events mean.
Content Startup
On macOS, an outerframe content library exposes an OuterframeContentLibrary entrypoint. The host gives it:
- a socket file descriptor for outerframe messages,
- an
OuterframeAppConnection, - initialization data delivered over the socket.
The OuterframeAppConnection currently exists for one critical operation: registering the root CALayer. Runtime operations such as cursor updates, input mode changes, navigation, pasteboard integration, and accessibility updates are socket messages, not direct AppKit calls.
Dealing with macOS directly
Any time you are rendering something, doing layout, composing text, drawing images, or updating animation state, you are mostly working directly with macOS APIs. The outerframe protocol influences when you render and what capabilities are available, but the actual rendering work is native.
Rendering is only one part of an app platform. Other parts include mouse events, keyboard input, IME, text editing hotkeys, accessibility, native blinking carets, Dark Mode, context menus, copy/paste, drag/drop, and networking. For these, the macOS outerframe implementation mirrors the underlying operating system where it can: the host passes events down to content, and content sends messages back when it needs host-owned behavior.
Synchronous APIs
Some AppKit APIs are naturally synchronous. Menu validation needs an immediate answer about which edit commands are enabled. Drag-and-drop hit-testing needs an immediate answer about whether the current point accepts the drag. Accessibility queries may need a current tree while the host is answering the operating system.
The outerframe chooses to embrace this shape where it matches the operating system. For example, the host can send editCommandValidationRequest, pasteboardDropHitTestRequest, or accessibilitySnapshotRequest and wait for the matching response.
The safeguard is that every synchronous request has a timeout. If content does not answer quickly, the host falls back to a conservative result, such as disabling the menu item, rejecting the drag location, or returning no accessibility snapshot. Synchronous messages should be reserved for query-shaped operations that content can answer from current UI state.
Bring your own UI framework
The outerframe keeps the platform small. It provides the core host/content boundary and leaves higher-level UI choices to content.
There are a few things you may expect from a web view that the outerframe doesn’t provide out-of-box. It has the building blocks to build everything you need, but it doesn’t pre-stack those building blocks together for you. There’s no notion of a scrollable page, there’s no single “correct” way to layout text and perform text selection.
Rather than providing one standard library for all of this, outerframe content can build the abstractions it needs on top of the socket protocol and the native platform.
Text input and keypresses
Similar to scrolling / zooming, text input is an area where it’s usually best to lean on the operating system as much as possible. We rely on the operating system to take raw events (e.g. option-shift-left-arrow) and convert them to text commands (e.g. “highlight previous word”), and we let it handle IME input for non-English text.
For raw input events, the host sends key events derived from keyDown and keyUp. Content can also request text-input mode. In text-input mode, the host uses the native text input system and forwards operations such as inserting text, setting marked text, unmarking text, and performing text commands.
Content should switch input mode through the outerframe socket protocol. Use raw-key mode for game-like controls and keyboard shortcuts that content wants to interpret itself. Use text-input mode when a text field or editor has focus.
Copy/paste
Outerframe content does not talk to NSPasteboard directly. The host owns native pasteboard access and converts between AppKit pasteboard objects and outerframe pasteboard messages.
The outerframe pasteboard model mirrors NSPasteboardItem:
OuterframeContentPasteboardItem
representations: [OuterframeContentPasteboardRepresentation]
OuterframeContentPasteboardRepresentation
typeIdentifier: String
data: Data
One outerframe pasteboard item becomes one native pasteboard item. Each representation becomes one native pasteboard type on that item. This preserves the common AppKit shape where a single item may have several equivalent representations, such as plain text, RTF, HTML, and a custom app type.
Copy and cut
Copy and cut are user-initiated host actions.
- The user chooses Copy or Cut.
- The host sends
selectionToPasteboardCopyRequestorselectionToPasteboardCutRequest. - Content replies with
selectionToPasteboardResponse. - The host writes the returned items to
NSPasteboard.general.
The host asks content which edit commands are currently enabled with editCommandValidationRequest, and content replies with editCommandValidationResponse. This mirrors AppKit’s synchronous menu validation shape, but the request is batched so one response can validate a whole menu pass.
For Cut, content should both return pasteboard items and remove the selected content.
Paste
Paste is also user-initiated.
- The user chooses Paste.
- The host reads
NSPasteboard.general. - The host filters pasteboard items to the accepted types content has advertised.
- The host sends
pasteboardContentPasted.
Content opts into host-delivered paste by sending setAcceptedPasteboardPasteTypes(...). An empty type list disables host-delivered paste.
Programmatic pasteboard reads and writes are a separate capability. Content can request them through pasteboardAccessRequest, and the host can apply policy before replying.
Drag and drop
Drag and drop follows the same rule as paste: content never uses the native pasteboard service directly. The host owns NSDraggingSession, NSDraggingDestination, NSPasteboard, and NSFilePromiseProvider.
Dragging out
Content starts a native drag by sending beginDraggingPasteboardItems. Each dragged item carries pasteboard representations and may include a PNG preview image. The preview image affects source-side drag UI only; it is not written to the pasteboard and is not visible to the destination.
For small data such as text, images, URLs, or custom pasteboard types, content can provide ordinary pasteboard representations.
For files, content should usually use a file promise:
org.outerframe.file-promise
A file promise lets the native drag begin immediately. If Finder or another native destination accepts the promise and asks for the file, the host sends filePromiseWriteRequest. Content then writes or downloads the file into the host-provided staging directory and replies with filePromiseWriteResponse. The host copies the staged file into the native destination.
This avoids moving large file bytes for a drag the user cancels.
Dragging in
Content opts into drops with setPasteboardDropBehaviorUniform(...) or setPasteboardDropBehaviorHitTest().
Uniform drop behavior accepts matching pasteboard types anywhere in the outerframe view. Hit-tested drop behavior asks content whether the current view-local point accepts the drag. The host sends pasteboardDropHitTestRequest during drag tracking, and content replies with an operation mask. A zero mask rejects that location.
When the user drops local files, the host does not reveal the original filesystem paths to content. Instead:
- The host reads the native dragging pasteboard.
- The host clones or copies each dropped file into the staging directory.
- The host sends content a
pasteboardContentDroppedmessage. - Dropped files are represented using:
org.outerframe.dropped-file-access
The dropped-file-access payload contains metadata, a staged local path, and an access ID. Content reads the staged file from inside its sandbox. When content is done, it sends releaseDroppedFileAccess(accessID:), allowing the host to remove that temporary access directory.
Temporary staging directory
The host provides a temporary staging directory for file exchange. The content process receives it as:
OUTERFRAME_STAGING_DIR
Use this directory for:
- promised files that content creates for drag-out,
- files that content wants to put on the general pasteboard as
public.file-url, - host-staged files delivered from native drops.
Do not treat this as persistent app storage. It is temporary exchange space. A host may prune it, and content should not store durable references to paths inside it.
For hosted launches, this directory is also the content process’s TMPDIR. Code that uses FileManager.default.temporaryDirectory, NSTemporaryDirectory(), or OUTERFRAME_STAGING_DIR should land in the same origin-scoped temporary area. The host may still reserve documented subdirectories inside it, such as dropped-file-access, for host-managed file leases.
Colors
macOS users expect their apps to look good in Light Mode, Dark Mode, and accessibility color modes. Usually, you should derive colors from dynamic AppKit colors such as NSColor.labelColor, NSColor.textBackgroundColor, and related semantic colors.
Because outerframe content renders through lower-level primitives like CALayer, dynamic colors need to be resolved under the host-provided NSAppearance before assigning CGColor values:
effectiveAppearance.performAsCurrentDrawingAppearance {
layer.backgroundColor = NSColor.textBackgroundColor.cgColor
}
The host sends appearance information during initialization and again when the effective appearance changes. Content should update its layer colors in response:
func updateAppearance(_ appearance: NSAppearance) {
appearance.performAsCurrentDrawingAppearance {
layer.backgroundColor = NSColor.textBackgroundColor.cgColor
textLayer.foregroundColor = NSColor.labelColor.cgColor
}
}
The important rule is to store enough state to redraw semantic colors when appearance changes. Do not resolve a dynamic AppKit color once at startup and assume it will remain correct for the life of the content process.
Networking
Outerframe content can use macOS’s built-in networking APIs, but it is sandboxed to have no direct access to the network. Instead, it connects through a local proxy provided by the host.
The proxy controls network policy. It enforces same-origin rules, so content does not have unfettered access to the network. The host can also map an origin to something that is not a normal TCP connection. For example, a host can use the proxy to let content connect to a server over SSH, or to connect to a local Unix socket file.
When you use macOS’s URLSession API, configure it to use the host-provided local proxy. The proxy configuration is delivered during initialization.
The proxy configuration includes a host, port, username, and password. The credentials are generated for a single content launch and authorize the content process to use only the routes the host registered for that outerframe app. Treat them as process-local secrets: use them for the proxy connection, but do not persist them.
let configuration = URLSessionConfiguration.default
configuration.connectionProxyDictionary = [
kCFNetworkProxiesHTTPEnable as String: true,
kCFNetworkProxiesHTTPProxy as String: proxyHost,
kCFNetworkProxiesHTTPPort as String: proxyPort,
kCFProxyUsernameKey as String: proxyUsername,
kCFProxyPasswordKey as String: proxyPassword
]
let session = URLSession(configuration: configuration)
Accessibility / automation
Outerframe content should expose an accessibility tree through the outerframe protocol. The host integrates that tree into the native accessibility system.
This is separate from rendering. If content draws a button into a layer, the host cannot infer that it is a button. Content needs to report the semantic tree, current focus, actions, and updates.