29c3ctf Web100 Regexdb Writeup

**Description**

Ever played Googlewhack? Well, this is a bit [easier][2] and gives you more power, enjoy.

Quote From google shared doc:

题目给的提示是googlewhack,这个的意思就是查询时仅返回一条结果。

首先经过测试,所有的输入要在--之间。  
比如:输入 -- 返回结果是 18 .

输入 - . - 返回结果是: results: 1: 29C3_NotAKey

Some useful information:

Query language “analysis” (Jay):

\- . is single character
\- * is wildcard
\-  | is OR operator
\-  ^ is BEGINNING OF STRING operator

Examples

** - 18 results (total in db)
*29C3* - 12 results
*^29C3* - 6 results
*.{40}* - 3 results (all results which are 40 chars long)
**!!This is a perl regular expression engine!!**

First step is to sort all the 18 result by length, use a simple script to help.

wget -q -O - --post-data="input=-$1-" "http://94.45.252.233/" | grep Results

Using the regex : ^.{$length}$ to seperate all the input data.

Then we can got almost all the wrong answers:

29C3_
Key: None
Key: 29C3_
29C3_Wrong
Hello World!
29C3_NoBadOne
29C3_____NO_____
Wrong: 29C3_Wrong
K3y: 29C3_AlsoBad
29C3_ImSimplyWrong
23 23 23 23 23 23 23 23
This one is unrelated...
Key: 29C3_AnotherOneWhichIsWrong
29C3_NotAKey <- this one is not a key
42 42 42 42 42 42 42 42 42 42 42 42 42 42
Key: 29C3_??        length 40
Key: 29C3_??        length 40

Then we know the answer is one of the last 2 key.

(after got the flag,we know that they are both the keys)

Construct the regex expression ^.{$left}[$query_set].{$right}$

Here we need $left+$right equal to 39 and find the unique $query_set,then it’s the answer.

We can simply use escape grammar to do this. It does not support [a-zA-Z] grammar,

so we have to write all the query characters.

Starts from $query="\1\2...\7f",check the result wether include character ’2′

Binary search here is a better choice.

After all above, we got the answer:

Key: 29C3_Well.This/Is#Not+The|Wrong?Key