TúHúE  FúPúU

 		         IúNúSúTúRúUúCúTúIúOúN

                            TúUúTúOúRúIúAúL

You might be thinking why write/read a tutorial on FPU instructions,they
wont help in coding viruses. Nor will they help in ezing my life, so if
thats what you really think then you are very wrong. These instructions
help a lot.

As the name suggests they not only handle integers, but we can play with
decimals too. Believe me these instructions will help you create viruses
which will go virtually undetected by the AVs (well probably :)).How?

Just write a encryptor/decryptor or poly using/for these. Because half of the
AV's emulators hang on emulating them or just think the file is not a infec-
ted one, your creations will go unnoticed.

So lets get on to business:

Its better you check whther your system handles FPU instructions. For this
you need to check whther a coprocessor is installed. This is done by a simple
instruction: SMSW EAX
Check the lower bit, if its set we have a coprocessor. Otherwise lets go
watch some movie.

Coming to coprocessors, if a processors are 386, 486, 586....then a coprocessors
are 387, 487, 587....and they handle FPIs.

Now comes the IEEE standard 754, these are Intel's standard for  making the
FPU's understand the floating point (decimal) numbers.

They are coded as follows:

            S-Sign, E-Exponent, F-Fraction or simply S,E,F.

The length of S is one bit (0 if the operand is positive and 1 otherwise).
Length of F is equal to: All_Bits - E_Length - 1.

Usually the floating point numbers are expressed as: S, 2^E*F

For our ease, the FPU provides us with its "stacks" to do the calculations
or store the values, etc..These are:
                ST(0), ST(1), ST(2),....ST(9)
ST(0) is also  reprsented as ST. These stacks hold the floating point nu-
mbers while the operation is being carried out.
Now lets see what happens when you load these registers. On the first load
things get loaded to ST and after each load the stack registers increase
and the operand which you loaded first also goes to higher regs, till ST(9).
Something like this:

            load a : ST(0) = a; ST(1) = 0; ST(2) = 0.....
            load b : ST(0) = b; ST(1) = a; ST(2) = 0.....
            load c : ST(0) = c; ST(1) = b; ST(2) = a.....

I hope things might be getting clear, if not read again.
The most used registers are ST(0) and ST(1).
Below is a table of most used FPU instructions taken from TECHHELP.



                  ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
                  ³ Data Transfer and Constants ³
                  ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ


FLD src         Load real: st(0) = src (mem32/mem64/mem80)
FILD src	Load integer: st(0) = src (mem16/mem32/mem64)
FBLD src	Load BCD: st(0) = src (mem80)

FLDZ	Load zero: st(0) = 0.0
FLD1	Load 1: st(0) = 1.0
FLDPI	Load pi: st(0) = ã (ie, pi)
FLDL2T	Load log2(10): st(0) = log2(10)
FLDL2E	Load log2(e): st(0) = log2(e)
FLDLG2	Load log10(2): st(0) = log10(2)
FLDLN2	Load loge(2): st(0) = loge(2)

FST dest		Store real: dest = st(0) (mem32/mem64)
FSTP dest		dest = st(0) (mem32/mem64/mem80) and pop stack
FIST dest		Store integer: dest = st(0) (mem32/mem64)
FISTP dest                  dest = st(0) (mem16/mem32/mem64) and pop stack
FBST dest            	Store BCD: dest = st(0) (mem80)
FBSTP dest                 dest = st(0) (mem80) and pop stack

                             ÚÄÄÄÄÄÄÄÄÄ¿
                             ³ Compare ³
                             ÀÄÄÄÄÄÄÄÄÄÙ

FCOM		Compare real: Set flags as for st(0) - st(1)
FCOM op		Set flags for st(0) - op (mem32/mem64)
FCOMP op	Compare st(0) with op (reg/mem); pop stack
FCOMPP		Compare st(0) with st(1) and pop stack twice
FICOM op          	Compare integer: Set flags for st(0) - op (mem16/mem32)
FICOMP op               	Compare st(0) with op (mem16/mem32) and pop stack

FTST		Test for zero: Compare st(0) with 0.0

FUCOM st(i)	Unordered Compare: st(0) to st(i) [486]
FUCOMP st(i)	Compare st(0) with st(i) and pop stack
FUCOMPP st(i)	Compare st(0) with st(i) and pop stack twice

FXAM		Examine: st(0) (set condition codes)

                       ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿
                       ³ Arithmetic ³
                       ÀÄÄÄÄÄÄÄÄÄÄÄÄÙ

FADD		Add real: st(0) = st(0) + st(1)
FADD src		st(0) = st(0) + src (mem32/mem64)
FADD st(i),st	st(i) = st(i) + st(0)
FADDP st(i),st	st(i) = st(i) + st(0) and pop stack
FIADD src		Add integer: st(0) = st(0) + src (mem16/mem32)

FSUB                	Subtract real: st(0) = st(0) - st(1)
FSUB src		st(0) = st(0) - src (reg/mem)
FSUB st(i),st	st(i) = st(i) - st(0)
FSUBP st(i),st	st(i) = st(i) - st(0) and pop stack
FSUBR st(i),st  	Subtract Reversed: st(0) = st(i) - st(0)
FSUBRP st(i),st           st(0) = st(i) - st(0); pop stack
FISUB src		Subtract integer: st(0) = st(0) - src (mem16/mem32)
FISUBR src	Subtract Rvrsd int: st(0) = src - st(0) (mem16/mem32)

FMUL		Multiply real: st(0) = st(0) * st(1)
FMUL st(i)		st(0) = st(0) * st(i)
FMUL st(i),st	st(i) = st(0) * st(i)
FMULP st(i),st	st(i) = st(0) * st(i) and pop stack
FIMUL src        	Multiply integer: st(0) = st(0) * src (mem16/mem32)

FDIV		Divide real: st(0) = st(0) ÷ st(1)
FDIV st(i)		st(0) = st(0) ÷ t(i)
FDIV st(i),st	st(i) = st(0) ÷ st(i)
FDIVP st(i),st	st(i) = st(0) ÷ st(i) and pop stack
FIDIV src		Divide integer: st(0) = st(0) ÷ src (mem16/mem32)
FDIVR st(i),st	Divide Rvrsd real: st(0) = st(i) ÷ st(0)
FDIVRP st(i),st	st(0) = st(i) ÷ st(0) and pop stack
FIDIVR src	Divide Rvrsd int: st(0) = src ÷ st(0) (mem16/mem32)

FSQRT		Square Root: st(0) = sqrt st(0)

FSCALE		Scale by power of 2: st(0) = 2 ^ st(0)

FXTRACT		Extract exponent: st(0) = exponent of st(0) and gets pushed
                                 	as st(0) = significand of st(0)

FPREM	Partial remainder: st(0) = st(0) MOD st(1)
FPREM1   Partial Remainder (IEEE): same as FPREM, but in IEEE standard [486]

FRNDINT		Round to nearest int: st(0) = INT( st(0) ), depends on RC flag

FABS	Get absolute value: st(0) = ABS( st(0) ), removes sign (changes to postive)
FCHS	Change sign: st(0) = -st(0)

                     ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
                     ³ Transcendental ³
                     ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

FCOS		Cosine: st(0) = COS( st(0) )
FPTAN		Partial tangent: st(0) = TAN( st(0) )
FPATAN.		Partial Arctangent: st(0) = ATAN( st(0) )
FSIN                       	Sine: st(0) = SIN( st(0) )
FSINCOS         	Sine and Cosine: st(0) = SIN( st(0) ) and is pushed to st(1)
                                  	st(0) = COS( st(0) )

F2XM1	Calculate (2 ^ x)-1: st(0) = [2 ^ st(0)] - 1

FYL2X	Calculate Y * log2(X): st(0) is Y, st(1) is X; this replaces st(0) and 
	st(1) with: st(0) * log2( st(1) )

FYL2XP1 	Calculate Y * log2(X+1): st(0) is Y; st(1) is X; this replaces st(0)
                                  and st(1) with: st(0) * log2( st(1)+1 )

                       ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
                       ³ Processor Control ³
                       ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

FINIT		Initialize FPU
FSTSW AX        	STore Status Word in EAX., ie. EAX = MSW
FSTSW dest	dest = MSW (mem16)

FLDCW src	LoaD Control Word: FPU CW = src (mem16)
FSTCW dest	STore Control Word: dest = FPU CW

FCLEX		Clear exceptions

FSTENV dest	STore ENVironment: stores status, control and tag words 
		and exception pointers into memory at dest

FLDENV src	LoaD ENVironment: loads environment from memory at src

FSAVE dest	Store FPU state: store FPU state into 94-bytes at dest

FRSTOR src	Load FPU state: restore FPU state as saved by FSAVE

FINCSTP		Increment FPU stack ptr: st(6)<-st(5); st(5)<-st(4),...,st(0)

FDECSTP		Decrement FPU stack ptr: st(0)<-st(1); st(1)<-st(2),...,st(7)
        The above 2 instuctions put the corresponding values too in the
        inc/dec stacks.

FFREE st(i)	Mark reg st(i) as unused

FNOP		No operation: st(0) = st(0), equivalent to nop.

WAIT/FWAIT Synchronize FPU & CPU: Halt CPU until FPU finishes current opcode.

FXCH - eXCHange instruction      st(0) <- st(1)
                                 st(1) <- st(0) similar to xchg.

Suppose if you want to calculate something like:

                tan(cos(sin(a*b+c*d)))/4

What do we do, simple, see below:

First things first, you'll below that for loading I have used FILD instead,
b'coz FLD expects an integer in IEEE 754 format. What I mean to say is that
, ssuppose you wanna load 12345678h in a variable say Var, now when you use
FLD Var gets loaded with something like 1.2345e-67 (pretty nasty, huh?).But
when you use FILD, Var gets loaded with 12345678h and that's what we want.

So lets the algo. below:

                finit                   ;thats very very necesarry
                fild dword ptr [a]      ;ST(0) = a, we work in dwords
                fild dword ptr [b]      ;ST(0) = b, ST(1) = a
                fadd                    ;ST(0) = a+b
                fistp dword ptr [Var]   ;Var=ST(0), store in a temp. variable
                fild dword ptr [c]      ;ST(0) = c
                fild dword ptr [d]      ;ST() = d, ST(1) = c
                fadd                    ;ST(0) = c+d
                fild dword ptr [Var]    ;ST(0) = Var, ST(1) = c+d
                fmul                    ;ST(0) = Var*(c+d)
                fsin                    ;sin of ST(0)
                fcos                    ;cos of ST(0)
                ftan                    ;tan of ST(0)
                mov Var2, 4h            ;Var2 = 4
                fild dword ptr [Var2]   ;ST(0) = Var2 = 4, ST(1) = tan(cos(sin(a*b+c*d)))/4

Pretty simple, and pretty long huh?
Something you always got to remember is that, put .387 (or .487, .587).
Like below:
                .386
                .387
                .data
                        [...]
                .code
                        [...]
                 End

This enables you to use 32 bit registers with FPU registers.

If you have any problems, email me at: aks8586@yahoo.com

-Surya/powerdryv
-23rd June, 2004
					I inspire....