top of page

Malware Analysis - Encoding/Decoding to Mask/Unmask Hackers Dirty Deeds - Base64

brencronin

Updated: Nov 29, 2024

One critical aspect of malware behavior lies in Data Obfuscation, where malware seeks to conceal its activities through various techniques such as Base64, XOR Operations, Ceasar Cipher, ROL, Custom Encoding, and Encryption. This article focuses on Base64, primarily an encoding technique utilized for transmitting binary data over text-based protocols, and occasionally employed for obfuscation purposes. Base64 serves multiple purposes including Data Transmission, File Embedding, Data Obfuscation, Data Hashing


Exploring Base64: An Introduction Through ASCII


In computer systems, data is represented using binary digits (bits), which consist of either a "1" or a "0". These bits serve as the fundamental building blocks of digital information. In computing hardware such as CPU, memory, and disk, binary digits denote the state of electrical signals, with "1" representing an "on" state and "0" indicating an "off" state. This binary representation forms the basis of digital data processing and storage within computer systems.


To transmit characters such as letters (A-Z, a-z, $, %, etc.), encoding is necessary. The ASCII encoding standard, developed over 60 years ago, assigns each English-based keyboard character a unique decimal value. To optimize storage efficiency, ASCII reserved 7-bit positions for encoding alphanumeric text-based communications. This approach also contributed to the concept of bytes, which consist of 8 adjacent bits. In binary-based systems, powers of 2 are utilized for efficiency, making 8 bits suitable for covering the 7 bits required for ASCII encoding while also aligning with the power-of-2 principle. The diagram below illustrates the 7-bit positions and their corresponding base-two values. Flipping specific bit positions determines the equivalent ASCII value. The decimal value depicted here serves as a numeric representation to aid comprehension of the 127-character concept; the actual value for that bit sequence is defined in the ASCII standard.


Here is an example of encoding the ASCII characters "DOG" (D, O, G) into binary. The ASCII chart on the upper right indicates that the value for D is 68, O is 79, and G is 71. At the bottom, the encoded binary representation of those values is provided. Also note the seven ASCII bit positions within the 8-bit byte construct as noted by the additional 1st bit.



In Base64 encoding, the process is reversed. In certain scenarios, there might be a lengthy sequence of binary data that needs to be encoded into an ASCII string for communication over protocols or mediums that cannot handle binary data formats and require simple text. Base64 facilitates the representation of binary data in ASCII format, which is essential for text-based protocols like HTML or MIME. This is particularly useful when transmitting binary data, such as images or executable files, within a text-based framework, requiring a method to incorporate binary data into textual protocol constructs. The diagram below illustrates a basic example of the binary image "crazy rabbit" being converted into Base64 format. In the HTTP protocol, the image is rendered using its Base64 representation instead of directly linking to the image file.



Base64 encoding is also commonly used for transmitting digital certificates. Microsoft offers a tool called certutil, documented at https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/cc732443(v=ws.11) which is capable of converting digital certificates to Base64 encoding, and decoding Base64 certificates back to binary format.



Base64 encoding utilizes all 8 bits of a byte and does not reserve the first 32 positions for control characters, unlike ASCII. The Base64 character set includes uppercase letters (A-Z), lowercase letters (a-z), digits (0-9), the plus sign (+), the forward slash (/), and the equals sign (=). Adding up the counts of these characters results in a total of 64:


  • A-Z: 26

  • a-z: 26

  • 0-9: 10

  • and /: 2


26 + 26 + 10 + 2 = 64.


Following the previous example, we can convert "DOG" to Base64. First, convert "DOG" to ASCII:


  • D = 68

  • O = 79

  • G = 71


Next convert ASCII to binary


  • D=68=01000100

  • O=79=01001111

  • G=71=01000111


Next you will have the binary stream:


010001000100111101000111


The binary stream is divided into groups of six bits.


010001 000100 111101 000111


The 6-bit Base2 is converted to decimal.


010001=17

000100=4

111101=61

000111=7


Base64 has its own conversion mapping so the Base64 representation of DOG is RE9H.



Base64 encoding divides input data into groups of 24 bits, which are then encoded into groups of 6 bits each. If the encoded data fits perfectly into these 6-bit groups, no padding is necessary. However, if the input data does not align perfectly with the 6-bit boundary, padding is added using the equals sign (=). Therefore, a string ending with one or two equal signs is a strong indicator that the data is Base64 encoded.


We can see this in action if we Base64 encode the string DOGs. Lowercase "s" is 115 in the ASCII chart. to represent that in 7-bit ASCII you would have 01110011.



Now the full binary string of DOGs is:


01000100010011110100011101110011


Divided into 6-bit grouping.


010001 000100 111101 000111 011100 11


Remember the Base64 larger 24-bit grouping, what is not filled in the binary string needs to be filled in with zeroes to fill the next 24-but group.


010001 000100 111101 000111<--1st 24 bits | 24 bits--> 011100 110000 000000 000000


Now the decimal representations of the next grouping of 24-bits divided into 4 groups of 6 bits.


011100=28

110000=48

000000=-1

000000=-1


Which is REH9cw==



What do you do if you see Base64 encoding?


Firstly, remain calm. The presence of Base64 encoding does not inherently suggest malicious activity, as there are numerous legitimate uses for it. However, having a foundational understanding of Base64 is crucial for contextualizing its presence and effectively decoding it when necessary.



Understanding the context of Base64 encoding is crucial due to its prevalence in normal operations. The example below illustrates this point: a file was encoded into Base64 using the certutil tool in Windows. It's not the Base64 encoding itself that raises suspicion, but rather the use of certutil, an administrative tool. Questions should be raised: is the user an administrator who would typically use this tool? How would they typically use certutil? It's worth noting that certutil is categorized in the Living off The Land (LOL) Binaries project, which documents common built-in Windows tools that hackers may leverage against compromised systems. Similarly, Linux systems have a Base64 encoding/decoding package called base64, documented in the Linux equivalent of LOL Binaries project, GTFOBins, which catalogs common system administrative tools hackers may utilize on Linux systems. More information about these projects can be found here: https://lolbas-project.github.io/lolbas/Binaries/Certutil/ https://gtfobins.github.io/gtfobins/base64/



A widely used tool for decoding Base64 is CyberChef, available at https://gchq.github.io/CyberChef/ When encountering Base64 encoding and needing to decode it, tools like CyberChef are invaluable for deciphering the obfuscated data.



References


Online Base64 encoders/decoders:





Cyberchef



Overview of Base64:





How Base64 may be used for maliciousness:





Microsoft Certutil tool:



Base64.py - Tool from Dider Stevens: https://isc.sans.edu/diary/31470




50 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page