The fight between AV companies and malware authors is getting bigger and bigger every single day. Both good and bad guys dedicate a lot of time to researching and implementing ways to detect and ways to avoid detection ( depending on which side these people are). Most of the malware research is usually concentrated on the infection mechanisms of the malware. Other points of focus include techniques used for the malware to communicate with its creator and completely surpassing the anti-virus evasion techniques used by the malware in the first place. This article aims to dig inside the loader used by the Matsnu malware family in order to deploy itself and avoid detection by AV products. Fortunately, at this point the variant is already detected by most AV vendors. In my job as a malware analyst, I very often hear this kind of AV evasion technique described as a “packer”. In a very abstract way, this might be true, but in a technical way, it really isn’t. From my experience with packers and manual unpacking, I expect that a packer will incorporate some compression algorithm and most probably an encryption algorithm (custom or not). Furthermore, the behaviour of a packer is usually a lot different. A packer will usually decompress and decrypt the code of the original executable and then will jump to its original entry point (OEP). On the other hand, I prefer calling these “packers” used by more and more malware authors as loaders. This is because of the technical details. These loaders will usually launch a child process in suspended mode, overwrite its memory with the decrypted code of the malware, and then resume its main thread. Some of them might then choose to allocate some extra memory on the child process instead of overwriting its memory and insert there the decrypted viral code. Additionally, this loader might then inject a thread to the child process with the starting address at the beginning of the allocated memory where the viral code is placed. Some others might overwrite themselves through a code stub written into an extra chunk of allocated memory and then jump back to the PE image address space. In addition, very often the malware authors will choose to first compress the original viral code using a common packer (such as UPX, PECompact etc.) and then encrypt it and incorporate it inside the loader. From a technical point of view, it is quite fair to distinguish these two types of mechanisms, and even if we keep calling them all “packers” for simplicity, it is necessary to understand the differences between them. The final goal of this article is to manage to isolate a fully working executable of the original malware under the various anti-AV protection layers. Self-Decryption Stage I A big part of the code of the loader will be decrypted on run-time through a “slow” decryption algorithm which does a lot of operations in each loop, decrypting the code dword by dword. The outer loop: 00401752 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10] 00401755 83C1 01 ADD ECX, 1 00401758 894D F0 MOV DWORD PTR SS:[EBP-10], ECX 0040175B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter 00401762 7D 5E JGE SHORT 004017C2 ß exit the loop once finished …more code here 0040178F E8 D7040000 CALL 00401C6B ß call to the decryption routine …more code here 004017C0 EB 90 JMP SHORT 00401752 ß jump up to loop start Inside the Decryption Routine: Some additional loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX register to the memory location pointed by EAX register: 00401ED8 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584, it is incremented by a dword in each iteration. Self-Decryption Stage II When the outer loop mentioned above has finished, there is another one taking place a few instructions later. 004017DE 8B4D E0 MOV ECX, DWORD PTR SS:[EBP-20] 004017E1 83C1 05 ADD ECX, 5 004017E4 894D E0 MOV DWORD PTR SS:[EBP-20], ECX 004017E7 817D E0 DF0C0 CMP DWORD PTR SS:[EBP-20], 0CDF ß check counter 004017EE 7D 77 JGE SHORT 00401867 ß exit the loop …more code here 00401862 E9 77FFFFFF JMP 004017DE ß jump up to loop start Self-Decryption Stage III There is one more loop coming next during the self-decryption stage. 0040187E BA 01000000 MOV EDX, 1 00401883 85D2 TEST EDX, EDX 00401885 0F84 D2000000 JE 0040195D The three instructions above create a fake execution flow redirection. In fact, since the value 1 is always passed to the EDX register, after performing the TEST instruction on the same register, the conditional JE jump that follows will never have any effect on the execution flow. 0040188B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter 00401892 0F85 A1000000 JNZ 00401939 ß if not equal jump to increase_counter Some more code is presented below: increase_counter: 00401939 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10] 0040193C 83C1 01 ADD ECX, 1 0040193F 894D F0 MOV DWORD PTR SS:[EBP-10], ECX enter_next_decryprion_routine: 00401942 68 F7480700 PUSH 748F7 00401947 68 18194F00 PUSH 4F1918 0040194C 8B55 F4 MOV EDX, DWORD PTR SS:[EBP-C] 0040194F 52 PUSH 00401950 E8 4A000000 CALL 0040199F ß call decryption routine 00401955 83C4 0C ADD ESP, 0C 00401958 E9 21FFFFFF JMP 0040187E ß jump to loop start Inside the Decryption Routine: Some more loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX and registered to the memory location pointed by EAX register: 00401B70 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584. It is incremented by a dword in each iteration. Self-Decryption Stage IV Going back to the loop outside the decryption function, we saw the condition which would normally signal the end of the looping process. It is fake, and we need to examine it more carefully in order to locate the next step. Indeed, when the conditions are correct, the execution will reach a CALL instruction: 0040191D E8 8FF8FFFF CALL 004011B1 The CALL to the beginning of the previously encrypted code is located inside this function : 004013B6 FF15 108B4000 CALL NEAR DWORD PTR DS:[408B10] ß value stored in this address is 00408584 Once we enter the function at address 00408584 we see the following: 00408584 E8 07000000 CALL 00408590 00408589 75 3A JNZ SHORT 004085C5 Note the obfuscation trick in the first instruction that confuses the disassembling engine. In fact, the CALL instruction will bring the execution in the end of the instruction starting at address 0040858B, which means that all those bytes in between are junk bytes in this case. 0040858B 03A0 21D64F5B ADD ESP, DWORD PTR DS:[EAX+5B4FD621] 00408591 81EB 05103A00 SUB EBX, 3A1005 00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E] 0040859D B9 8B020000 MOV ECX, 28B 004085A2 66BF 7592 MOV DI, 9275 004085A6 66313E XOR WORD PTR DS:[ESI], DI 004085A9 6683C7 02 ADD DI, 2 004085AD 83C6 02 ADD ESI, 2 004085B0 E2 F4 LOOPD SHORT 004085A6 004085B2 FC CLD 004085B3 7E 2A JLE SHORT 004085DF 004085B5 1B95 CFF6215C SBB EDX, DWORD PTR SS:[EBP+5C21F6CF] 004085BB 8745 92 XCHG DWORD PTR SS:[EBP-6E], EAX 004085BE D7 XLAT BYTE PTR DS:[EBX+AL] 004085BF 1F POP DS 004085C0 30D5 XOR CH, DL 004085C2 94 XCHG EAX, ESP This is what we see once we execute the CALL instruction: 00408590 5B POP EBX 00408591 81EB 05103A00 SUB EBX, 3A1005 00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E] ß starts from address 004085B2 0040859D B9 8B020000 MOV ECX, 28B ß loop counter 004085A2 66BF 7592 MOV DI, 9275 ß decryption key 004085A6 66313E XOR WORD PTR DS:[ESI], DI ß decrypt by XORing with 9275, one word in each iteration. 004085A9 6683C7 02 ADD DI, 2 004085AD 83C6 02 ADD ESI, 2 004085B0 E2 F4 LOOPD SHORT 004085A6 The above decryption algorithm will decrypt an extra portion of code starting from the instruction located immediately after the LOOPD. So, at this point we saw the various steps used by this loader to decrypt the next parts of the code. Now it’s time to continue with the rest of its mechanisms. Dynamic Imports Resolving & PEB Loader Data Structure Normally, malware authors retrieve the VAs of the APIs by using two Windows APIs, which are the LoadLibrary and the GetProcAddress APIs. These are employed in order to avoid detection through the imports normally listed inside the imports table. However, in this case the author of the loader has decided to go through the PEB (Process Environment Block) Loader Data Structure – PEB_LDR_DATA structure in order to retrieve the necessary information, which is a more stealth way to retrieve the VAs of the necessary APIs. The pointer to this structure is located at PEB + 0x0C. Back to where we stopped, immediately after the end of the decryption loop we locate a CALL at address 004085CD and by entering this function we see another CALL at address 004086EF, and inside that function is where the loader of the malware will access the PEB_LDR_DATA structure. 0040870E 64FF35 3000000 PUSH DWORD PTR FS:[30] 00408715 58 POP EAX In the two instructions above, we notice another obfuscation attempt. In fact, instead of pushing the address of PEB onto the stack and then popping that value back to EAX, we could just do MOV EAX, DWORD PTR FS:[30]. 00408716 8B40 0C MOV EAX, DWORD PTR DS:[EAX+C] ß move to EAX the pointer to the PEB_LDR_DATA 00408719 8B48 0C MOV ECX, DWORD PTR DS:[EAX+C] ß mov to ECX the pointer to the first LDR_MODULE structure of the first module loaded by the windows loader 0040871C 8B11 MOV EDX, DWORD PTR DS:[ECX] ß save to EDX the pointer to the LDR_MODULE structure of the next module loaded by the windows loader 0040871E 8B41 30 MOV EAX, DWORD PTR DS:[ECX+30] ß mov to EAX the pointer to the name of the first module name loaded by the windows loader. Then follow another CALL at address 00408728, to a function dedicated to calculate a magic dword from the name of the currently examined module. If the dword matches the predefined constant, then the loader knows it found the necessary loaded module to continue its mechanisms. Calculation Algorithm: 00408797 8A10 MOV DL, BYTE PTR DS:[EAX] ß go through all chars one by one 00408799 80CA 60 OR DL, 60 ß start dword calculation 0040879C 01D3 ADD EBX, EDX 0040879E D1E3 SHL EBX, 1 ß end dword calculation 004087A0 0345 10 ADD EAX, DWORD PTR SS:[EBP+10] ß increase pointer to string name by 2, because it’s stored as Unicode 004087A3 8A08 MOV CL, BYTE PTR DS:[EAX] ß mov next char value to CL 004087A5 84C9 TEST CL, CL ß check if it’s zero, which means we reached the end of the string 004087A7 E0 EE LOOPDNE SHORT 00408797 ß if it’s not jump up to loop for the next char 004087A9 31C0 XOR EAX, EAX ß zero out EAX 004087AB 8B4D 0C MOV ECX, DWORD PTR SS:[EBP+C] ß move to ECX magic dword 004087AE 39CB CMP EBX, ECX ß check if calculated dword = magic dword 004087B0 74 01 JE SHORT 004087B3 ß if it is, module located 004087B2 40 INC EAX 004087B3 5A POP EDX 004087B4 5B POP EBX 004087B5 59 POP ECX 004087B6 89EC MOV ESP, EBP 004087B8 5D POP EBP 004087B9 C2 0C00 RET 0C The figure that follows demonstrates the condition in which the two values matchwhen checking the kernel32.dll loaded module.

Figure 1 – Kernel32.dll module located Once the necessary module is located, we will reach the next part of the code that will attempt to find the VAs of specific exported functions from the kernel32.dll after exiting from the previous function. 00408735 8B41 18 MOV EAX, DWORD PTR DS:[ECX+18] ß get the image base of kernel32.dll from LDR_MODULE structure 00408738 50 PUSH EAX 00408739 8B58 3C MOV EBX, DWORD PTR DS:[EAX+3C] ß get the offset of its PE Header 0040873C 01D8 ADD EAX, EBX 0040873E 8B58 78 MOV EBX, DWORD PTR DS:[EAX+78] ß get the RVA of its Export Table Once the loader of the malware locates the export table of the kernerl32.dll will use it in order to retrieve the VAs of few APIs, four in total, necessary to proceed. Here is the table that is created at this stage: 00408AA5 760CBC8B kernel32.LoadLibraryExA 00408AA9 760D05F4 kernel32.VirtualAlloc 00408AAD 760C50AB kernel32.VirtualProtect 00408AB1 760D1837 kernel32.GetProcAddress In the next instalment, I will begin by showing how to Locate and Isolate the Embedded Decrypted Executable. Have fun!