Arab Security Cyber Wargames Final Round Deepfake Challenge Writeup

27 Jun 2026

Are You General Ludd?

Deepfake has been around for years now, and it’s not going anywhere soon. In fact, it’s becoming more and more predominant in our daily lives. A picture here, a video there, a small tweak in one’s media to improve or remove something entirely, and then we woke up to what we have now.

This challenge aimed to shine some light on the importance of deepfake forensic analysis and how to approach such challenge from a forensic perspective.

The description goes as follows:

The infamous hacktivist group "General Ludd" after their raid on the federal treasury, 
their leader is attempting to escape. In order to do so, he changed his face, 
our taskforce raided the surgeon's clinic and managed to get a picture of the deepfake 
generated picture of the group's leader new face. 

Your task as a DF analyst, should you choose to accept it, is to find the parameters 
that indicate that this picture is deepfake generated and not a real one.

Flag Format: ASCWG{SHA1}

Flag Structure: echo -n "A Value,B Value,C Value,D Value,E Value" sha1sum 

NOTE: when writing the script set sigma value to "1.0" to get the correct results

Challenge file: RobDoe.png

Rob_Doe

The very first instinct the players had was to hand the picture along the description to any AI model to extract the values. This method of thinking and such behavior are the ones I abhor.

In our over-reliance on tools and frameworks, we lost our access to the most fundamental object that Digital Forensics is built on. UNDERSTANDING THE TRUE MEANING OF THE ARTIFACT. In addition to that players are increasingly shutting down brains and using AI or LLMs to solve basic challenges.

I have a saying that I always say, when tools fail you. What will you do? This question is what keeps me questioning my methodology and what lies there that needs improvement?

I wanted the players to get a reality check from this challenge, that without the proper investigative mindset, you’ll fail. you’ll be throwing blind punches — no stance, no read, no target — and landing none of them.

The methodology isn’t built on suspicion. It’s built on physics

real photograph doesn’t just exist — it’s the end product of a chain. Light passes through a lens, hits a sensor, gets processed by an Image Signal Processor (ISP), and gets compressed by a codec. Every stage leaves a mark. Not a visible one, but rather a forensically — properly analyzed and extracted — artifact that aims to prove or disprove the validity of the media an investigator is analyzing.

A GAN optimizes for one thing: convincing a discriminator. It learns to produce pixel arrangements that look real. What it doesn’t learn — what it has no incentive to learn — is how to fake the physical residue of an imaging pipeline it never went through.

The pipeline has layers. The GAN bypassed all of them. So the approach is simple in principle, demanding in execution — probe each layer independently, extract what the acquisition process would have left behind, and measure what the GAN failed to fake. Five metrics. Five different angles of investigation aimed in one direction, to validate the authenticity of the picture in question in a proper forensic manner.

Without any further ado, these are the five metrics that players should have investigated, and their forensic value.

1. Noise Residual Mean — The Sensor Doesn’t Lie

Every real camera sensor is a flawed instrument. Shot noise, read noise, thermal noise — the hardware bleeds its imperfections into every pixel it captures. That noise isn’t a bug. In forensic terms, it’s a SIGNATURE. When you subtract a Gaussian-blurred version of the image from the original, what remains is that signature — the residual. A real photograph’s residual is textured, organic, consistent with hardware acquisition. A GAN-generated image? The residual is wrong. Not absent — wrong. The generator learned to synthesize texture, not noise. And those are not the same thing. The mean of that residual tells you how strong that deviation is — averaged across every pixel in the image.

2. Noise Residual Standard Deviation — Inconsistency as Evidence

The mean tells you the strength of the residual. The standard deviation tells you something different — how consistent that residual is across the image. A real sensor produces noise that follows a known statistical distribution tied to the hardware. A GAN doesn’t. The generator synthesizes different regions of the face differently — skin, hair, background, edges — and each region leaves a different residual fingerprint. That inconsistency across regions is what the standard deviation captures. Uniform residual means consistent acquisition. Erratic residual means something else was at work.

3. Laplacian Variance — The Lens Has a Memory

Real optics produce specific edge behavior. Sharpness gradients, depth-driven blur, natural falloff at boundaries — a lens renders a scene with physical constraints the GAN has never encountered. The Laplacian operator — a 3x3 kernel applied to the luminance channel — probes exactly that. It responds to edges, to transitions, to the second derivative of intensity across the image. In a real photograph, that response has character. In a GAN-generated image, edges tend toward one of two failure modes: OVER-SHARPENED boundaries that no real lens produced, or unnaturally smooth transitions where texture should exist. The variance of the Laplacian response quantifies which failure mode is present — and how severe.

4. NR_Score — The Composite Argument

No single metric is a verdict. A noisy residual alone could be a high-ISO photograph. An irregular Laplacian alone could be a heavily processed image. But three metrics that all point in the same direction — that’s a forensic argument. The NR_Score is a weighted composite of the residual mean, residual standard deviation, and scaled Laplacian variance — a single value that represents the cumulative noise fingerprint deviation from what a real imaging pipeline would have produced. Alpha, beta, gamma. Weights that reflect the relative forensic contribution of each component. A number the image can’t negotiate.

A word of transparency — alpha, beta, and gamma are empirically calibrated values, not forensically derived ones. They were tuned for this specific context: face images, GAN output, this resolution range. Knowing the difference between a calibrated tool and a validated methodology isn’t a disclaimer — it’s the foundation of sound forensic science.

5. Error Level Analysis — The Codec Remembers Everything

Every image that passes through a JPEG encoder gets divided into 8x8 blocks, each compressed independently based on its content. A real photograph has a consistent compression history — every block went through the same pipeline, at the same quality level, at the same time. When you resave that image at a known quality, the error between the original and the resaved version is UNIFORM. Synthesized content doesn’t share that history. GAN-generated textures compress differently from photographic content — the error level diff exposes the inconsistency as brightness variation in the difference image. Amplify that difference by 30x and it becomes visible. Collapse it to a mean intensity and it becomes measurable.

It is worth noting that classical ELA was designed to detect intra-image compression inconsistency — regions spliced in after the fact that don’t share the original’s compression history. What we’re actually measuring is how synthesized texture behaves under DCT compression differently from photographic content. The signal is real, yet the mechanism is different.

The Solution

The first finding didn’t require a script. It didn’t require a tool. It required reading.

Rob_Doe.png.

Rob is short for Robert. Doe is the universal placeholder for an unidentified individual. The challenge handed every player the identity of the subject before the image was even opened, and it went completely unnoticed. Not because it was hidden. Because no one was looking. That oversight set the tone for everything that followed.

Upon opening the image, the first thing a trained forensic eye registers — the picture is TOO CLEAN. The over-smoothness of the skin texture is a trademark in almost all AI and GAN-generated media. These pictures are too clean. Real skin has pores, micro-shadows, asymmetric imperfections.

This image has none of that, looking further, the eye levels are irregular — facial geometry that doesn’t hold up under scrutiny. The light composition and shadow contrast don’t cohere. And the person in the picture bears an obvious resemblance to Robert Downey Jr. With these basic observations cleared out of the way, that’s where the investigation starts — and that’s the mindset that carries it through to the flag.

In order to properly investigate the picture in question we need to write a script to compute the 5 metrics mentioned earlier, but what truly lies under the hood is one note I mentioned explicitly in the description. Sigma is set to 1.0.

Sigma controls the spread of the Gaussian blur — how aggressively the image is smoothed before subtraction. The blurred image is your signal estimator — your best approximation of the image stripped of its acquisition noise. What remains after subtraction is the residual. The noise fingerprint.

Too low — the blur barely touches the image, the residual captures noise AND fine texture detail. You can’t separate the two. Too high — the blur overcooks it, genuine texture bleeds into the residual and corrupts the signal.

The slightest deviation from 1.0 and the ResidualMean, ResidualStd, and NR_Score shift. The SHA1 flag doesn’t match.

The idea and the methodology that arrived later weren’t fully formed. It started with what I already knew — ELA and noise residual analysis had been on my radar for a while, tools I understood well enough to trust in a forensic context. Laplacian variance was newer territory that I need to explore, something I started reading into and found compelling for what it actually measures: not the image itself, but how the lense and optics left their mark on it. The residual metrics came later, as the research started connecting dots. Once you understand what a real sensor leaves behind, the residual isn’t an exotic concept — it’s the most direct way to ask whether any hardware was involved in producing this image at all. The five metrics didn’t feel like a design decision by the end. They felt like the only logical set of questions to ask of an image that claimed to be real but had no physical history to show for it.

The script is as follows:

#!/usr/bin/env python3

import argparse
import numpy as np
from PIL import Image, ImageFilter, ImageChops, ImageEnhance
import os

# handle RGBA/palette modes before any processing
def to_rgb(img):
    if img.mode in ("RGBA", "LA"):
        return img.convert("RGB")
    if img.mode != "RGB":
        return img.convert("RGB")
    return img

def blur(img, sigma):
    return img.filter(ImageFilter.GaussianBlur(radius=sigma))

def lap_var(arr):
    # BT.601 luminance
    Y = 0.299*arr[...,0] + 0.587*arr[...,1] + 0.114*arr[...,2]

    # 3x3 laplacian kernel
    K = np.array([[0, 1, 0],
                  [1,-4, 1],
                  [0, 1, 0]], dtype=np.float32)

    # reflect pad to avoid border bias
    Yp = np.pad(Y, 1, mode="reflect")
    L = (K[0,1]*Yp[0:-2,1:-1] + K[1,0]*Yp[1:-1,0:-2] + K[1,1]*Yp[1:-1,1:-1] +
         K[1,2]*Yp[1:-1,2:]   + K[2,1]*Yp[2:,1:-1])
    return float((L / 255.0).var())

def ela(img, path, q=90, amp=30):
    base = os.path.splitext(path)[0]
    tmp  = base + "_resaved.jpg"
    out  = base + "_ELA.png"

    img.save(tmp, 'JPEG', quality=q)
    diff = ImageChops.difference(img, Image.open(tmp))
    ela_img = ImageEnhance.Brightness(diff).enhance(amp)
    ela_img.save(out)

    return round(np.array(ela_img).mean() / 255, 2)

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("image")
    ap.add_argument("--sigma", type=float, default=1.0)  # don't touch this
    ap.add_argument("--alpha", type=float, default=0.6)
    ap.add_argument("--beta",  type=float, default=0.3)
    ap.add_argument("--gamma", type=float, default=0.1)
    args = ap.parse_args()

    img    = to_rgb(Image.open(args.image))
    img_np = np.asarray(img, dtype=np.float32)

    # noise residual
    b       = np.asarray(blur(img, args.sigma), dtype=np.float32)
    abs_res = np.abs(img_np - b) / 255.0
    r_mean  = float(abs_res.mean())
    r_std   = float(abs_res.std())

    # laplacian
    lv        = lap_var(img_np)
    lv_scaled = min(lv * 50.0, 1.0)

    # composite score
    nr = args.alpha*r_mean + args.beta*r_std + args.gamma*lv_scaled

    # ela
    ela_mean = ela(img, args.image)

    print(f"ELA Mean Intensity : {ela_mean}")
    print(f"ResidualMean       : {r_mean:.4f}")
    print(f"ResidualStd        : {r_std:.4f}")
    print(f"LaplacianVar       : {lv:.6f}")
    print(f"NR_Score           : {nr:.4f}")
    print()
    print(f"Flag string : {ela_mean},{r_mean:.4f},{r_std:.4f},{lv:.6f},{nr:.4f}")

if __name__ == "__main__":
    main()

The correct values are as follows:

ELA Mean Intensity : 0.15
ResidualMean       : 0.0061
ResidualStd        : 0.0123
LaplacianVar       : 0.001481
NR_Score           : 0.0147

Flag string        : 0.15,0.0061,0.0123,0.001481,0.0147

Flag: ASCWG{93b82a849b1532afe57b33f908f00cee27eab0c3}

And that concludes the writeup.

0x01g33k@home:~$

Archive

About

RSS