Wargames.my 2019 - stream writeup

mkhdznfq

2020-12-20

i originally authored this challenge for hitb gsec in mid-2019. but since no team was able to solve the challenge, the “main idea” has been recycled through multiple ctfs including wgmy 2019 (yeah, i’m a one lazy-ass mofo)

same

1.5 years passed, and i have yet to see any writeup available for this challenge publicly. hence this post.

tl;dr

regenerate dropped initial vectors by recovering the state of mersenne twister
use recovered encrypted messages to launch klein’s attack to crack the rc4 key
???
profit

initial analysis

import random
import secrets
import string
import hashlib
import base64
 
def KSA(key):
  keylength = len(key)
  S = list(range(256))
  j = 0
  for i in range(256):
    j = (j + S[i] + key[i % keylength]) % 256
    S[i], S[j] = S[j], S[i]
  return S
 
def PRGA(S):
  i = 0
  j = 0
  while True:
    i = (i + 1) % 256
    j = (j + S[i]) % 256
    S[i], S[j] = S[j], S[i]
    K = S[(S[i] + S[j]) % 256]
    yield K
 
def ARCFOUR(key, data):
  S = KSA(key)
  keystream = PRGA(S)
  return bytes(char ^ next(keystream) for char in data)
 
pwd = b''.join(secrets.choice(string.printable).encode() for _ in range(20))
digest = hashlib.md5(pwd).digest()
data = b'you got it!! flag: ' + open('flag.txt', 'rb').read()
rand = random.Random()
 
for _ in range(420): # blaze it
  iv = (rand.getrandbits(64)).to_bytes(8, byteorder='big')
  key = iv + digest
  ctxt = ARCFOUR(key, data)
  print(base64.b64encode(iv + ctxt).decode())
 
for _ in range(100000):
  iv = (rand.getrandbits(64)).to_bytes(8, byteorder='big')
  key = iv + digest
  ctxt = ARCFOUR(key, data)
  print(base64.b64encode(b'NOMOREIV' + ctxt).decode())

contestants were provided with the above python script which ran on a remote server. they were also given the ip:port pair which basically spits out the output of the script once contacted. output example:

output

looking at the source code, the program is quite straightforward. it first creates a random 20-chars password. then it uses its md5 hash as a key to rc4-encrypt a message containing the flag. for the first 420 times the message being encrypted, the results were printed to stdout together with the original initial vectors. the next 100k encrypted data, however, were printed with NOMOREIV as stand-ins. simple right?

further analysis shows that the random password is generated using a cryptographically secure rng function in the secrets module. also, the fact that a new password is regenerated and hashed on every fork made it foolproof and impractical to be guessed. brute-forcing it would be downright impossible due to it having 255^16 (3.1962658e+38) possible combinations. therefore, we can fully shift our focus to rc4 encryption weaknesses.

rc4 weakness

unlike most modern stream ciphers, rc4 does not include any nonce/iv during the encryption. as a result, a single key cannot be used to securely encrypt multiple messages. one popular approach to address this issue is to hash initial vectors with the key but many implementations just simply concatenate them (i’m looking at you wep standard). thus, with a fairly weak key scheduling algorithm (ksa), it is possible to recover the key with a statistical attack due to the correlation of the keys and biased outputs from the encryption. this weakness was heavily exploited during the great wifi cracking era (anyone remembers the glorious days of aircrack-ng + cheap chinese wireless adapter? lol). over the years, a lot of variations of the attacks were discovered, but they all have one thing in common; the nonces/ivs need to be known beforehand for the attack to succeed.

recovering initial vectors

as known before, only the first 420 encrypted messages were paired with their respective ivs. these ivs were generated using the random module with the size of 64 bits or 8 bytes. luckily, the rng function provided by the module is pseudorandom and the original state could be recovered as long we have enough samples. the prng function is based on the widely used mersenne twister (mt19937) and a simple google search will provide quite a few pocs on how to recover them. i.e. https://github.com/eboda/mersenne-twister-recover.

but wait! there’s a catch. the internal sate of mersenne twister is only 32 bits and our ivs came in 64 bits. thus, we need to find exactly how the ivs were extracted from the internal state before we can plug them into our recovery routine. now, if we check the implementation of getrandbits, we can see that the bits were filled out by 32-bit words, starting from least significant to most significant. meaning, our 64 bits ivs could be split into two states where the least significant 32 bits will be the former state followed by the next 32 significant bits being the latter. also, note that our ivs bitlength is in the multiplication of 32, ergo no bits were omitted and we could fully recover the internal states. to validate this theory, we could use the following python snippet:

Python 3.8.6 (default, Sep 30 2020, 04:00:38)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> random.seed(0)
>>> states1 = [random.getrandbits(32), random.getrandbits(32)]
>>> random.seed(0)
>>> tmp = random.getrandbits(64)
>>> states2 = [tmp & 0xffffffff, tmp >> 32]
>>> assert states1 == states2

based on this, we could salvage 840 states of the random generator from the provided 420 ivs. these samples turn to be large enough for us to recover the original state and regenerate the dropped ivs. now that we have a way to restore them, let’s continue and take a look at the statistical attack to retrieve the rc4 key.

klein’s attack

meth

there are actually multiple adaptations of the related key attack. and the most famous among them is fluhrer, mantin, and shamir (fms) attack which gains popularity for breaking wep protected wireless network. unfortunately, this attack will only work on certain weak initial vectors and require us to collect a large sample (~10 million?) of encrypted messages for it to succeed. however, in 2005, a. klein manages to prove more correlation between the rc4 keystream and the key used for encrypting the message. his theory was used as the basis of the powerful pyshkin, tews, and weinmann (ptw) attack that greatly increase the probability of success with only 100k (or less) encrypted messages.

in his paper, klein showed that l-X[l-1] will takes the value of S[l] with a probability of 2/n, where:

l = length of key
X = keystream
S = s-box internal state
n = s-box size (256 for rc4)

therefore, we could use the following formula to recover the key:

notation

where:

K = key used
S_l = s-box internal state after l iteration
j = pointer to element in S
j_l = j value after l iteration

the above formula will result with the key at l index approximately 2/n times. hence we can use all 100k encrypted messages to “vote” for the most probable key and choose the key with the most occurrences. also note that we will need the keystream, X, for this attack to work. the keystream could be obtained by xoring the encrypted messages with known plaintext. looking at the source code, we could only find 19 bytes of known plaintext; you got it!! flag:. therefore, we are still 5 bytes short since we will need 24 bytes of keystream (length of initial vector + size of md5 hash). for this, we could abuse the fact that the ctf flag format will start with wgmy{ which is “coincidently” 5 bytes ;p. now that all requirements are fulfilled, it’s time to implement them.

poc||gtfo

import pwn
import base64
import random
 
# https://github.com/eboda/mersenne-twister-recover
class MT19937Recover:
  """Reverses the Mersenne Twister based on 624 observed outputs.
  The internal state of a Mersenne Twister can be recovered by observing
  624 generated outputs of it. However, if those are not directly
  observed following a twist, another output is required to restore the
  internal index.
  See also https://en.wikipedia.org/wiki/Mersenne_Twister#Pseudocode .
  """
  def unshiftRight(self, x, shift):
    res = x
    for _ in range(32):
      res = x ^ res >> shift
    return res
 
  def unshiftLeft(self, x, shift, mask):
    res = x
    for _ in range(32):
      res = x ^ (res << shift & mask)
    return res
 
  def untemper(self, v):
    """ Reverses the tempering which is applied to outputs of MT19937 """
 
    v = self.unshiftRight(v, 18)
    v = self.unshiftLeft(v, 15, 0xefc60000)
    v = self.unshiftLeft(v, 7, 0x9d2c5680)
    v = self.unshiftRight(v, 11)
    return v
 
  def go(self, outputs, forward=True):
    result_state = None
 
    assert len(outputs) >= 624       # need at least 624 values
 
    ivals = []
    for i in range(624):
      ivals.append(self.untemper(outputs[i]))
 
    if len(outputs) >= 625:
      # We have additional outputs and can correctly
      # recover the internal index by bruteforce
      challenge = outputs[624]
      for i in range(1, 626):
        state = (3, tuple(ivals+[i]), None)
        r = random.Random()
        r.setstate(state)
 
        if challenge == r.getrandbits(32):
          result_state = state
          break
    else:
      # With only 624 outputs we assume they were the first observed 624
      # outputs after a twist -->  we set the internal index to 624.
      result_state = (3, tuple(ivals+[624]), None)
 
    rand = random.Random()
    rand.setstate(result_state)
 
    if forward:
      for i in range(624, len(outputs)):
        assert rand.getrandbits(32) == outputs[i]
 
    return rand
 
def KSA(key, n_key):
  keylength = len(key)
  S = list(range(256))
  j = 0
  for i in range(n_key):
    j = (j + S[i] + key[i % keylength]) % 256
    S[i], S[j] = S[j], S[i]
  return S, j
 
def PRGA(S):
  i = 0
  j = 0
  while True:
    i = (i + 1) % 256
    j = (j + S[i]) % 256
    S[i], S[j] = S[j], S[i]
    K = S[(S[i] + S[j]) % 256]
    yield K
 
def ARCFOUR(key, data):
  S = KSA(key, 256)[0]
  keystream = PRGA(S)
  return bytes(char ^ next(keystream) for char in data)
 
host = '127.0.0.1'
port = 12345
n_iv = 100000
rkeys = []
 
r = pwn.remote(host, port)
 
with pwn.log.progress('Recovering Mersenne Twister state') as p:
  states = []
  n_mt = 420
  for i in range(n_mt):
    enc = base64.b64decode(r.recvline().rstrip())
    iv_i = int.from_bytes(enc[:8], byteorder='big')
    states.append(iv_i & 0xffffffff)
    states.append(iv_i >> 32)
    p.status('{}/{}'.format(i+1, n_mt))
  mtr = MT19937Recover()
  rand = mtr.go(states)
 
with pwn.log.progress('Fixing ciphertexts with recovered IVs') as p:
  with open('ciphertexts.txt', 'w') as fp:
    for i in range(n_iv):
      enc = base64.b64decode(r.recvline().rstrip())
      iv = (rand.getrandbits(64)).to_bytes(8, byteorder='big')
      fp.write(base64.b64encode(iv + enc[8:]).decode() + '\n')
      p.status('{}/{}'.format(i+1, n_iv))
 
r.close()
with open('ciphertexts.txt') as fp:
  ivs = fp.readlines()
 
with pwn.log.progress('Executing Klein\'s attack on ciphertexts') as p:
  n_rkey = 16                           # length of md5 in bytes
  ptxt = b'you got it!! flag: wgmy{'    # len(ptxt) == len(iv + n_rkey)
  for i in range(n_rkey):
    candidates = [0] * 256
    for i2 in range(n_iv):
      enc = base64.b64decode(ivs[i2].rstrip())
      iv, ctxt = enc[:8], enc[8:]
      key = iv + bytes(rkeys)
      n_key = len(key)
      keystream = bytes(a ^ b for (a, b) in zip(ctxt, ptxt))
      S, j = KSA(key, n_key)
      rkey = (S[(n_key - keystream[n_key - 1]) % 256] - (S[n_key] + j)) % 256
      candidates[rkey] += 1
    rkeys.append(candidates.index(max(candidates)))
    p.status('{}/{}'.format(i+1, n_rkey))
 
pwn.log.info('Root key: {}'.format(':'.join('{:02x}'.format(x) for x in rkeys)))
enc = base64.b64decode(ivs[0].rstrip())
iv, ctxt = enc[:8], enc[8:]
key = iv + bytes(rkeys)
pwn.log.info('Plain text: {}'.format(ARCFOUR(key, ctxt).decode()))

poc