|
Charles Douglas Wehner 30 December 2003
The art of good programming is to lose the plot. That is to say, if a program is written for ease of understanding by the programmers it will probably contain few ingenious tricks. If, however, several things are taking place at once, the CPU will work at breathtaking speed towards its goal. It is not necessary for the programmer to continue to remember what he wrote - it is quite sufficient that these details are encapsulated in the program. However, good documentation is essential for the maintenance of the system. An example of a good programming trick arose when a friend known as Stavros showed the author his university assignment. He had used a constant and a slope to find the first guess for the Newton-Raphson square-root. You MULTIPLY by the slope, and ADD the constant. This gives accuracy sufficient for IEEE Real 32 in just three turns of the loop. The author observed that it took perhaps TWO-AND-A-HALF turns. This cannot be capitalised upon because two would be insufficient whilst three is a little too good. The author decided to SHIFT THE PACKED R32 ONE PLACE RIGHT. Bizarre! And yet it works. Floating point consists of a number often called the MANTISSA (defined as something of little magnitude or importance) and an EXPONENTIATING number (often wrongly called the EXPONENT). The "EXPONENT" is usually binary. That is to say, it defines how many times the Mantissa must be shifted left or right in binary for it to become the correct number. For example, if E=4 and M=1.5 the number is 24. Each left shift doubles the Mantissa. Thus, we have 1.5 shifted to 3 6 12 and 24. The observant will notice that from this definition, the "Exponent" is actually the INTEGER of the LOGARITHM BASE 2 of the number. To cover the special case of the number being zero, it is usual for the Exponent to be set to zero. So we add an offset to the Exponent. As a result, zero binary shifts will not be confused with the number zero. In Real 32, this offset is 127. Notice how even Exponents have become odd and vice-versa due to the odd offset. It is usual for the Mantissa to be left-justified in binary. That means that the leftmost bit is ALWAYS set. So 1.5 has a Mantissa of 11000000 00000000 00000000. The leftmost bit means "1", the next means "0.5" and so on, halving with each step to the right. Because the leftmost bit is ALWAYS set, it is wasted. So once we have reached this stage (having packed the shifts with offset into the Exponent and having a left-justified Mantissa) we may as well RESET this bit when the number is positive, or LEAVE IT SET for negative numbers. This is the USUAL format of floating-point. NOT SO with IEEE math! With Real 32, the ENTIRE Exponent byte is shifted right until the bottom bit overlays the top bit of the Mantissa. The Signum (sign of the number) is specified by resetting the TOP of the Exponent for positive, or setting it for negative. You might now understand why the author dislikes IEEE math. You cannot reclaim your Exponent without bit-shifting. However, 1.5 with an odd number of binary shifts will have a Mantissa 11000000 00000000 00000000 whilst 1.5 with an even number of shifts has 01000000 00000000 00000000. Shift these right and you get 01100000 00000000 00000000 and 00100000 00000000 00000000 respectively. The "VIRTUAL BIT" cannot be shifted. Whatever happens, the leading bit always represents 1. So we have 11100000 00000000 00000000 and 10100000 00000000 00000000 for the two first-guesses. These numbers are 1.75 and 1.25 respectively. So we automatically generate a first guess for root 3 of 1.75. The correct root of 3 is about 1.732. Similarly, for root 1.5 we have 1.25. The correct root is about 1.2247. In addition, the powers of two are also halved. So 3 times 2 to the power 10 gives 1.75 times 2 to the power 5. 3 times 2 to the power 9 gives 1.25 times 2 to the power 4. These are not as good as the Stavros guesses - good for two and a half turns. These guesses are good for perhaps two-and-three-quarters. But we CANNOT go round the loop a non-integer number of times. We go three times around the loop as with Stavros. But we have replaced his MULTIPLY and ADD by a single 32-bit SHIFT. That is the name of the game. Lost the plot? The PROGRAM has not. I used this trick in many versions, and the computers always delivered a square root in record time. And if the program needs maintenance, one can always think it through again. It is very convenient for the lay user to have floating point for the generation of sines and cosines. However, the writers of commercial graphics programs usually construct their line-drawing routines from INTEGER mathematics. If they do not, the program runs too slowly. The inner core of floating point runs at the integer level. A central processor does not understand floating point. Registers contain binary switches. However, pressure from lazy programmers forced Intel to incorporate floating point routines into the Pentium range. This has the commercial advantage that users of the facilities find themselves locked into the Intel system, with programs that are not portable to other processors. In the design of the inner core of floating point, we generate routines that can equally well be applied to integer graphics systems. However, graphics require scaled-down versions because 32-bit precision is excessive.
Cosine
The cosine of a negative number is the same as that of the equivalent
positive number. Clear the signum (topmost bit in IEEE Real, topmost
of the Mantissa in other systems).
Now divide by Pi. Look at the lowest bit of the quotient. If it is set,
it represents half-cycles 2, 4, 8 etc. These half-cycles are the
mirror-image of the others. Use CMA to complement the accumulator,
or "XOR eax,-1" or similar. The numbers should now run backwards for
these half-cycles.
The Mantissa should represent PAIRS of quarter-cycles (quadrants).
The first quadrant is positive. The second quadrant gives a NEGATIVE
result, as well as running backwards.
So you shift left into carry, or "ADD eax,eax" to convert the Mantissa
into a single quadrant. Also, PRESERVE the carry. If it was set, you
will CURRENTLY complement the accumulator ("XOR eax,-1") and
LATER negate the result.
The numbers represent 0 to 0.99999999976716935634613037109375 of a
quadrant. They will have to be multiplied by 3373259426. This is the
fixed-point representation of Pi/4. The product will be in EDX:EAX
on a Pentium. So we move the EDX number into EBX for the following
routine:
Remember to remove the test loadings (MOV EBX,1686629713 &c.). The
program is 32 bit IBM-compatible only.
|
|
|
The display routine shows everything in the register. For full
precision, a 32-bit number must be displayed as 32 figures in any
number system other than powers of two. This does not mean that
32 decimal places are available. There are NINE decimal places,
suitable for an EIGHT-FIGURE display.
But you will ask "WHAT ARE THE MAGIC
NUMBERS"?
First, we have Horner`s rule. Isaac Newton used to evaluate polynomials
backwards:
This leads to the magic number e. The author did the same kind of thing,
but according to Professor Donald E Knuth, Horner published first.
Given Y=a plus bX plus cX-squared plus dX-cubed
Take d and multiply by X Given Y=a minus bX plus cX-squared minus dX-cubed
Take d and multiply by X So Horner`s rule says RUN THE POYNOMIAL BACKWARDS. Here we have the Cosine equation from Colin McLaurin and Brook Taylor, where Cos(X)= 1 - X-squared/2 + X-fourth/24 - X-sixth/ 720 etc. So we prepare X-squared by "mov eax,ebx ; mul ebx ; mov ebx,edx" and then use integers representing 1/2, 1/24 and so on - but in reverse order. They were made by taking 4294967296 and dividing it by 2. That result was divided by 3 and then by 4. This next result was divided by 5 and by 6. And so on. Only the integer part was kept. So if we multiply a 32-bit integer in EBX by the number 178956971 in EAX, for example, the product appears in EDX:EAX. Taking EDX down to EAX (transfering the number) has the effect of dividing by 4294967296. So we get 178956971/4294967296 or one 24th. But the design of a computer program depends on the parameters of the CPU. Does it waste time to load the instruction "mov eax,178956971"? Is it quicker just to load 178956971 by means of "LODSD"? Here is that same program as a loop. It takes up less space.
|
If it is desired not to divide by Pi/4, this can be done by altering
the coefficients. Here the coefficients are increased in size by
Pi-squared/16 once for the second coefficient, twice for the third,
three times for the fourth and so on.. However, it takes one more
turn of the loop, one more coefficient.
|
|
|
The Sine can be created by means of
Unfortunately, accuracy is lost for very small arguments. As a result,
one should first check that the argument is greater than that which
gives a different result to its argument. These very small arguments
are returned unchanged, and only the larger arguments resolved via
the polynomial.
There is a great deal that could be said on the subject, but as the
title says, these are the RUDIMENTS of floating point.
Here are a couple of small sinusoid programs (sine and cosine)
suitable for graphics or for the fast fourier transform of JPEG:
| |||||||
|
It will be seen that the sine was generated by evaluating
Thereafter, the result is multiplied by X. For a Fourier or line-drawing
program, it suffices to multiply by the SHIFTED version of X. For
example, if 10000000 0000000 0000000 represents the Mantissa of an
EIGHTH, 00010000 00000000 00000000 represents the shifted number.
The right-shift will have caused bits to fall off the register -
and these are lost forever. Accordingly, in floating point we would
keep the 10000000 00000000 00000000 for the final multiply,
whilst turning it into 00010000 00000000 00000000 to square to
00000010 00000000 00000000 for the routine.
When the result is multiplied by the unshifted Mantissa, the old
Exponent becomes the Exponent of the result. However, it is always
possible that 10000000 00000000 00000000 or whatever, when multiplied
in, gives a leading zero. Floating point requires that the Mantissa
is kept left-justified.
So in the 32-bit routines, we would do the following:
|
evaluate: push ebx SHIFT EBX ACCORDING TO EXPONENT mov eax,ebx mul ebx mov ebx,edx RUN THE HORNER`S RULE ROUTINE pop ebx mul ebx and edx,edx js continue add eax,eax adc edx,edx DECREMENT THE EXPONENT CREATE FP-NIL IF IT DOES NOT DECREMENT continue: |
|
Arctangent The arctangent can be created by means ofATN X = X - X-cubed/3 + X-fifth/5 &c. This has all the appearances of the sine, except for one small difference - there is no exclamation mark!!!!!!!!!!!!!!!!!! So the factorial convergence of the sinusoids has gone. Instead of dividing by 1, by 6, by 120, by 5040 and 362880, we use 1, 3, 5, 7 and 9. This is a disaster. The only solution is to use very small arguments. So if the number is about 1/2, we will be multiplying by about 1/4 at every "turn" of the loop. This delivers exponential convergence instead of factorial convergence. To begin with, the tangent of 45 degrees is unity. Numbers greater than unity can be turned into their RECIPROCALS, and the routine can be CALLED rather than jumped to. On return, we have the chance to subtract the result from 90 degrees to obtain the correct result. Numbers from zero to 1 (the tangents of zero to 45 degrees) have to be reduced in size. Fortunately, it can be shown that the tangent of 22.5 degrees is ROOT-2 MINUS 1. "IT CAN BE SHOWN" is the lazy mathematician`s cliche. OK. Here goes. The tangent of (A minus B) is the DIFFERENCE of Tan(A) and Tan(B), divided by Tan(A) TIMES Tan(B) plus 1.
So if A is 45 degrees, and B is 22.5 we have
There is a formula for resolving equations of the form This gives -2 +/- ROOT(4 + 4) / 2 Which is -1 +/- ROOT(2) So Tan(22.5) is 0.414213562, which when multiplied by 4294967296 gives 1779033704 as a magic number for our program. If the argument is less than this, we JUMP to the program. If it is more, we have to reduce the argument and CALL. After the call we will add 22.5 degrees to the result.
Here is a piece of that code: |
mov eax,1779033704 (Tan 22.5) sub ecx,eax (DIFFERENCE) mul ebx (PRODUCT OF ARGUMENT AND 0.414) stc (SET THE CARRY) rcr edx,1 (CREATES 1+PRODUCT /2) mov ebx,edx (SAVE IT) mov edx,ecx (FETCH PRODUCT OF ARGUMENT AND 0.414) div ebx (DIVIDE BY HALF 1+PRODUCT) clc (RESULT IS DOUBLE SIZE) rcr eax,1 (SO HALVE THE RESULT) |
|
This then will deliver a reduced argument, the tangent of the "ANGLE LESS 22.5", without the angle being known. On return, the result has 1686629713 added on. This is 22.5 degrees in radians, as a fraction of 4294967296.
When creating a giant laboratory mathematical system, for example, the constants of floating-point will not be known. It is the job of the programmer to conjure them up. The author wrote first a 32-decimal-place system, and then a 65-place one. You begin by creating a four-function system, just like the cheapest pocket calculator - but more accurate. Then you use it to create ROOT 2, by starting with 1.5 as a first guess. It does not matter how slow the system is whilst you are developing it. You just want the tangent of 22.5 degrees to the full precision. The author also generated the tangent of ELEVEN-AND-A-QUARTER degrees for use with the 32-place mathematical system, and FIVE-AND-FIVE-EIGHTH for the 65-place system
Here is the algorithm for finding the tangent of half the angle that
you have the tangent of:
Notice how the tangent of five and five-eighth degrees is just under one tenth. This means that it will converge by one decimal place of itself. As the arctangent polynomial uses odd powers of the argument - advancing in twos, that it - the polynomial will deliver two decimal places per coefficient when five and five-eighths degrees or less are used.
The snippet arctangents: mov eax,1779033704 sub eax,ebx jnc evaluate mov ecx,ebx mov eax,1779033704 sub ecx,eax mul ebx stc rcr edx,1 mov ebx,edx mov edx,ecx div ebx clc rcr eax,1 mov ebx,eax call evaluate add edx,1686629713 ret can be rewritten for the higher precision, with new MAGIC NUMBERS. These routines can be NESTED. So Pi/8, Pi/16 and Pi/32 are added to the angle as required by the math. Notice that the Mantissa of Pi/8, Pi/16 or Pi/32 is always that of Pi. It is slower but more accurate to use less nested argument-reduction routines. So when evaluating Pi for laboratory use, one might drive the big math from Basic, and use ROOT-TWO MINUS 1 only. When the big math has been built, the deeper levels of argument-reduction are added in order to speed it up.
|
|
Logarithm It was pointed out that the "Exponent" byte is the INTEGER of the logarithm to base 2 of a floating-point number. This can be reclaimed, and we only have the fractional part to find.The fractional part will be the LOG base 2 of the Mantissa. Put them both together, float the result and you have the LOG base 2. What is it good for? Well, logarithms are proportional to one another. Multiply any logarithm base 2 by 0.69314718 - the natural log of 2 - and you have converted that logarithm. The ANTILOG base 2 of 0.5 is 1.414213562, root 2. So if you multiply the Mantissa by itself and it exceeds 2, it will have contained root 2. If it takes TWO multiplications to reach 2, it contained the FOURTH root of 2, which is 0.25 in base-2 logarithms. So by repeated squaring of the Mantissa we can generate the LOG base 2 bit-by-bit:
If the processor has a very fast multiply routine, this is quite adequate. The concept is also good when building a giant math, because speed is not always important. The reader will have noticed that much of computer transcendental math consists of MAGIC NUMBERS. By means of the McLaurin-Taylor equation for antilogarithm one can obtain e to any accuracy. It will be 2.718281828459045 or thereabouts. The LOG base 2 of this gives the RECIPROCAL of the LOG base e of 2. Log base e of 2 was required to convert between bases, as described. So slow technology that needs no MAGIC NUMBERS becomes ENABLING TECHNOLOGY to find the magic numbers for the faster math. Supposing we take the logarithm of 1.5, of 1.25, of 1.125, of 1.0625 and so on. Suppose also that we multiply these by 4294967296. We get MAGIC NUMBERS. These represent the sizes of numbers in the Mantissa of a logarithm equivalent to multiplying by 1.5, by 1.25 and so on. In the following, 0.69314718 is represented by 2977044472. So we can expect a Mantissa of 1.999999999 to deliver this result. However, are there any GAPS? We have assumed that the Mantissa bytes are FFFFFFFF, but if we shift and add the Mantissa without causing an overflow, we have found a gap. The LOG base e of that shift-and-add must then be SUBTRACTED from the result. We leave the shift-and-add added into the argument because we do not want to rediscover the same gap. However, the Mantissa with its gap plugged needs to be re-shifted. Hence the "each" loop. To use this routine, you use the Mantissa of the number. You remove the offset from the binary Exponent, and multiply by 0.69314718. To this you add the result of this routine. Then you float the lot.
|
|
Will you please stop looking up there - I`m down here. Yes,
I know what has distracted you. It`s those numbers 1, 2, 4, 8....
They REALLY are powers of two. It REALLY is bizarre.
When a moiety is added to a logarithm, it is equivalent to multiplying
a number by 1 PLUS a moiety. But it is specifically the NATURAL
logarithm where for small moieties they are the SAME. So as the
number has more and more zeroes after the 1, as in 1.00000000xyz,
the logarithm tends to linearity (0.00000000xyz).
We can use this fact to make the routine run many times as fast.
All the gaps in the top two bytes will have been filled in after
just 16 loops, and the remaining ones can simply be transferred
across from the argument with add edx,ebx at the end of
the routine. Note the truncated MAGIC NUMBER series.
|
The importance of truncating the sequence may not be immediately
obvious until it is pointed out that the shift is often an
N-SQUARED function.
There exists the shift instruction
on some Intel CPUs. This will presumably wire the input at edx directly
to the output at edx. It might just take one or two ticks of the
system clock. Without any reference books to guide him, the author
tried it and it failed to work.
In order to make progress, the author wrote
So the number of shifts, if all possible shifts took place, would be
(N)(N+1)/2, which for 32 is 528 but for 16 is just 136. This means that
the second program is almost four times faster.
|
|
Antilogarithm We have seen how to generate a cosine. This uses the McLaurin-Taylor equation. There is an exact equivalent for the natural antilogarithm:Exp(X) = 1/0! + X/1! + X-squared/2! + X-cubed/3! + .... Amended 3 December 2006 We would prepare the magic numbers by dividing 4294967296 by the factorial of 12, of 11, of 10, and so on down to 2. The cosine program itself can be used, but with the code "mov eax,ebx, mul ebx, mov ebx,edx" removed. The number of multiplications is extended to twelve. A reversal of the above shift-and-add method is more elegant. One cannot give hard and fast rules as to which is the BETTER, because much depends upon the parameters of the processor. In the following program, we do not fill in the gaps in the logarithm but subtract the logarithms of 1.5, 1.25, 1.125 from it if we can. Each time we succeed, say with log 1.125, we have to multiply the result, say by one and an eighth.
Whereas the logarithm tends to 1 PLUS UNITY-times the argument as it approaches zero, the antilogarithm tends to its own SLOPE. In the jargon, d/dx(exp(x)) tends to exp(x). So we do not just add the sixteen linear bits onto the result, as with the logarithm. Instead, we multiply them by the result obtained so far (by the SLOPE), and then add on.
By the use of exactly the same Magic Numbers for log and antilog,
further savings can be made. We save 64 bytes by using the same table.
|
|
Square root There are two aspects to creating an efficient square root. Firstly, there must be a good first guess. We studied the case of a number such as 1.21 having its virtual 1 removed, giving 0.21, having this halved to 0.105 and having the 1 restored to give 1.105.This style of first guess was the author`s answer to Stavros. It was pointed out that the exponent must be halved also, and that the two processes can be combined. However, if the argument itself is not altered, we will be heading towards the same result for even and odd powers of two. It is unavoidable that for even powers of two we must shift the argument one place to the right. This loses a single bit, equivalent to half a bit in the square root. These routines are assumed to be 9.6329 places of decimals to be used with an 8-decimal display, and having about one and a half places of hidden precision. This is quite usual, and gives us the freedom to have an error of plus or minus 2 at the internal level. Tests would normally be carried out to ensure that one can find the root, the root again X number of times and then square the result X number of times without corrupting the data. It is far better to work this way than to fuss over an error of 1, to slow the system with correction routines, and still end up with an ill-conditioned math that corrupts the data when repeated. Here is a fast square root that puts it all together:
|
|
Tangent The tangent is sine over cosine.Alternatively, make a Chebychev system. Arc Cosine The cosine is adjacent over hypotenuse. The tangent is opposite over adjacent. Hypotenuse squared is opposite squared plus adjacent squared.So you square the cosine (A-squared over H-squared). You take the reciprocal (H-squared over A-squared). You subtract 1 (O-squared over A-squared). You find the square root (O over A). You use the ARC-TANGENT. Alternatively, make a Chebychev system. Arc Sine The sine is opposite over hypotenuse.So you square the sine (O-squared over H-squared). You take the reciprocal (H-squared over O-squared). You subtract 1 (A-squared over O-squared). You find the square root (A over O). You find the reciprocal. You use the ARC-TANGENT. Alternatively, make a Chebychev system. Hyperbolic Sine The Sinh is exp(X) minus exp(-X).So you take the natural antilogarithm, copy it, find the reciprocal of the copy and take that reciprocal away from the antilogarithm. Alternatively, make a Chebychev system. Hyperbolic Cosine The Cosh is exp(X) plus exp(-X).So you take the natural antilogarithm, copy it, find the reciprocal of the copy and add that reciprocal onto the antilogarithm. Alternatively, make a Chebychev system. Arc Hyperbolic Sine Sinh plus Cosh give exp(X).Sinh-squared plus 1 gives Cosh-squared. So you copy your Sinh, square it, add 1 and find the root. Add the two together and use the natural logarithm. Alternatively, make a Chebychev system. Arc Hyperbolic Cosine Sinh plus Cosh give exp(X).Cosh-squared minus 1 gives Sinh-squared. So you copy your Cosh, square it, subtract 1 and find the root. Add the two together and use the natural logarithm. Alternatively, make a Chebychev system.
Chebyshev Panyut Chebyshev was professor of mathematics at St. Petersburg University in Russia in the 19th century. He made a study of the properties of the difference table - the Fluxions of Isaac Newton or the Calculus of Finite Step of Brook Taylor. For computer purposes, the author has classified four kinds of Chebyshev polynomial, the first of which is the original.
Chebyshev Class 1
|
| |
|
|
It is a program to find the polynomial for a cosine. It takes an array of data b#(order#), and repeatedly subtracts one entry from the next.
To conserve space, each subtraction-result is put in the place of one
of the subtraction-arguments. Thus, b#(order#) is subtracted from
b#(order#-1) and put in place of b#(order#). It follows that FOR n = worldorder# TO newworldorder# STEP -1 is clearly a backward step. We expect the coefficients to be 2.08767569878681D-09 0 2.755731922398589D-07 0 2.48015873015873D-05 0 1.388888888888889D-03 0 4.166666666666666D-02 0 0.5 0 1 |
but what we get is
1.468204438707082D-09 6.305890048088956 3.693187649016713D-09 15.86212017051791 -2.865451532454294D-07 -1230.702062016428 2.024703232504431D-08 86.96034167712017 2.477649261364022D-05 106414.2254851703 2.158385460854001D-08 92.70194966529824 -1.388901884932958D-03 -5965288.173139811 5.426053459115588D-09 23.30472215324912 4.166666514144308D-02 178956964.1158813 2.720365172132065D-10 1.168387944748463 -.500000000027346 -2147483648.11745 1.151784868647322D-12 4.946878342867902D-03 1 |
from which we can extract all those coefficients that OUGHT
to be nil, and examine them:
with the figures on the right being what they would be in a 32-bit register.3.693187649016713D-09 16 2.024703232504431D-08 87 2.158385460854001D-08 93 5.426053459115588D-09 23 2.720365172132065D-10 1 1.151784868647322D-12 0 You will see at once that starting with the highest-order coefficient, the deviation from nil rises to a peak of 93/4294967296 before tapering down to nil.
What the program does is to difference numbers, such as the CUBE:
This third-order function is thereby reduced to second-order:
Which reduces to a first-order function:
Which reduces to a zero-order function (a constant): And that constant is the coefficient multiplied by the FACTORIAL of the order.
However, we can subtract and divide by 1:
Subtract and divide by 2:
Subtract and divide by 3: In this way, the division by a factorial takes place automatically. We must also bear in mind that this is a kind of CALCULUS, like dY/dX, except that is is DY/DX, where the large D represents not an infinitesimal but a finite difference.In the case of simple cubes of contiguous integers, the step, DX, is unity. However, in the Qbasic program it was π/2 divided by the desired order.But the McLaurin-Taylor equation tells us that the cosine is an infinite order function. So there will be infinite terms. If we desire to create a 12th order polynomial for cosine, that twelfth coefficient will have to compensate for the missing fourteenth, missing sixteenth and so on. So the 12th order coefficient will be wrong, and its error in turn will affect the 11th, whose error affects the 10th. This leads to what the author calls the Chebyshev error wedge, which tapers down for lower orders. So the Class 1 polynomial has built-in error compensation for the non-infinte nature of the order. It works only within a restricted argument domain, however.
Chebyshev Class 2
Change a single line of the Qbasic program: FOR worldorder# = order# TO 1# STEP -2#and all the coefficients that should be nil are skipped over. Any data in the system can only be resolved into even-order coefficients
|
1.468204438707082D-09 6.305890048088956 -2.599555259157078D-07 -1116.500482222445 2.462861056073645D-05 105779.0769042833 -1.38820109477232D-03 -5962278.30231851 4.166598437043557D-02 178954040.2266679 -.4999999243134452 -2147483322.928722 1 |
|
Chebyshev coefficients Class 2, therefore, have data crushed into selected coefficients to the exclusion of others.
|
|
Chebyshev Class 3
difference:
FOR newworldorder# = 1# TO worldorder#
FOR n = worldorder# TO newworldorder# STEP -1
b#(n) = (b#(n) - b#(n - 1)) * order# / newworldorder#
NEXT n
NEXT newworldorder#
RETURN
removedata:
FOR n# = 0# TO worldorder# - 1#
c# = 1#
d# = n# / order#
FOR m = 1 TO worldorder#
c# = c# * d#
NEXT m
a#(n#) = a#(n#) - (b#(worldorder#) * c#)
NEXT n#
RETURN
|
|
where Pi# and 2# have been removed, the system
no longer gives the result for 1=1 radian, but the result for 1=ninety
degrees. A radian is about 57 degrees, so the numbers are multiplied by powers of (90/57)-squared, pi/2 squared. Our earlier cosine had coefficients as follows: 27 2023 108241 3848192 89607967 1089502240 1003736220 plus 4294967296 4294967295 |
Now we get3.312752312467637D-07 1422.816284179687 -2.37739886444422D-05 -102108.5037233546 9.128504307032955D-04 3920662.746010168 -2.085315136936856D-02 -89563603.1499756 .2536653550224166 1089484403.949509 -1.23370036343147 -5298702714.001476 1 |
|
because INSTEAD OF taking the PURE MATH coefficients of the McLaurin-Taylor equation, and multiplying in the conversion factor, we have INCORPORATED the multiplication by the conversion factor into the Chebyshev analyser. In this case, the multiplication factor is Pi/2, and we incorporated it by NOT using it as part of the specification for the step when differencing, and by NOT using it when correcting the original data. Pi-square divided by eight is 1.23370055. Compare 1.233700363, which is influenced by the Chebyshev error wedge. Class 3 polynomials therefore have some conversion factor built in at the time when the Chebyshev system was evaluating them.
Chebyshev Class 4 In the current case, we are using 32-bit registers. So we multiply the number by 4294967296, find its integer, and divide again by 4294967296. Any error introduced by this approximation at the final coefficient will have to be picked up by the penultimate coefficient. That in turn is forced to be realisable as an integer, throwing the burden of resolving the data to the coefficient before that. Here is how it is done:
where GOSUB integers has been added. This routine is:
and the output is as follows:
3.310851752758026D-07 1422 -2.377154305577278D-05 -102098 9.128390811383724D-04 3920614 -.0208531329408288 -89563524 .2536653475835919 1089484372 -1.233700362965465 -5298702712 1 The cosine of ninety degrees is nil, so the coefficients must add up to nil. This is not the case. It then remains to find the source of the error and correct it before the coefficients can be used. There is unfortunately not enough time to delve deeper into the exact cause of this error. The author had previously made huge mathematical systems where the coefficients had been perfected down to the final binary digit. However, the British government robbed the author of everything. These notes were therefore hastily written in order to preserve the mathematical principles on which those systems were based. Furthermore, as a refugee in Europe, the author has no access to reference manuals on programming. The huge systems designed by the author had been optimised for timing. It is left to those who have access to the appropriate data to substitute faster equivalent codes. The cosine had been achieved to huge precision by the use of two inner programs - MINICOS and MINISIN. These worked over the range zero to π/4. Thus, the principles outlined above can be extended by making all cosines below π/4 use the Minicos routine, whilst the range π/4 to π/2 has the accumulator complemented and the Minisin routine used.
|
|
Gamma (Fractorials) Leonhard Euler did a great deal of work on the generalisation of the factorials. He called his function the GAMMA FUNCTION, and it is unfortunately not compatible with the factorials from which it arose.The Gamma function finds uses in probability theory, and in the Bessel function series used in astronomy. The author preferred to maintain compatability with the factorials, because it is usual for a pocket calculator to refuse to deliver a factorial when the INTERNAL structure of the argument is a non-integer. For example, the display might show a 2, but the calculator displays "E" (for error) when the "!" is pressed. This is because the 2 is contained internally as 1.999999999 or so, which is not an integer. If the "!" function is pressed with 1.999999999 in the registers, the fractorial will be 1.99999999942278, which will be displayed as 2 - the correct factorial for what was seen on the display. The user need not know that the internal workings contain the fractional factorials, fractorials. The factorial of 2 is the fractorial of 2 - which is the gamma of 3. After the author had demonstrated his systems to a mathematician friend, that friend returned and said that there is now a trend amongst Russian mathematicians to discard the gamma function in favour of the generalised factorials. When building the following function into a mathematical system, bear in mind that the addition of the 21 will cause it to overflow its register. Accordingly, the overflow must be caught. The argument +/-X.1234567 would be taken, and the fractional part (multiplied by 4294967296) put into ebx. Then you run the routine. The result in eax will be the fractional factorial of that fraction, or the gamma of one plus that fraction. To create a fractorial system, therefore, if the number was 3.4, for example, you will multiply the result by 1.4, by 2.4 and then by 3.4. Again, in a fractorial system, if the number was -3.4, you evaluate the fractorial of 0.6. Then you DIVIDE by 0.6, DIVIDE by -0.4, DIVIDE by -1.4 and DIVIDE by -2.4. In a pocket calculator, you advise the user who wants the gamma of, say, 5.1, to SUBTRACT 1 and use the "!". This give the fractorial of 4.1, which is the same. The author recommends that the standard "!" sign be replaced by an exclamation mark with a comma in place of the dot. This is the author`s standard way of annotating an instrument to show that it is capable of fractions as well as integers. In a spreadsheet or other program, the subtraction of 1 from the argument can be programmed in, and both functions can be offered. The system is shown displaying the fractorial of a half, which is root-pi divided by two. The display routine has been shortened to eight figures to show the development of the system.
Here are the coefficients as decimal numbers:
The second coefficient - the slope of the fractorials at zero
= was defined by Euler as "little gamma". It is (from memory)
0.577215664, and not as shown. However, these are the Chebyshev
polynomials of a difficult function.
|
|
The Harmonic Function If you take the slope of the factorial and divide it by the actual value at that point, you get the harmonic series.Accordingly, as the factorial of zero is 1, the harmonic number for zero is minus little gamma. It is to Euler`s great credit that three hundred years ago, without the aid of computers, he evaluated little gamma to sixteen decimal places. The author achieved twenty-nine with the aid of computers, and could have evaluated it to any desired accuracy if there was a need. The thirty-two place laboratory mathematics were pressed into service, and high-order polynomials were developed so that the chebyshev error-wedge did not intrude into the lower orders. Studies were then made of the behaviour of these coefficients, as if they had emerged from pure mathematics, to see if some natural law existed like the McLaurin-Taylor equations for Sine, Cosine and Exponential. Discoveries were made that involved hyperbolic cosines in a very tangled way, but Government deliquency was constantly disrupting the research. However, it was found possible to discard the first coefficient, to divide the second (minus gamma) by 1 and use it as the first. The third coefficient was divided by 2 and used as the second, and so on. The result was the DIFFERENTIATION of the entire polynomial, leading to the Harmonic Function. This was probably the first time that this function had been generated. Something new to the world of mathematics. After the author`s home was demolished and all his possessions taken by the government, the author set to work to create a duplicate. On the eve of the Iraq war, at the end of 2002, the government was training killers. "Doctors" Hopton, Alison, Ledward and Walton refused to kill their patients as practice for Iraq. They were all convicted of numerous offences from rape to the misdiagnosis of over 480 healthy children. Their activities had been documented by the government, but had not been punished, for over forty years. That is the blackmail method used to try to keep them obedient. The cancer charities were out in force, gathering funds to finance the war. The government gang had infiltrated even these. "Lord" Archer was in prison. His "Simple Truth" charity had gathered over 50 million pounds Sterling, ostensibly for the Kurds. Instead, it was earmarked for the development of more weapons to test on the Kurds. He had pocketed some of the money, and was punished for a previous crime. So, the gang told their would-be killers to "kill somebody". That somebody was the author, who nearly died but escaped to Europe. He had lost everything. The Queen and her three sons had no part in this. They are, however, powerless, hemmed in by the "Dark Forces" of which the Queen spoke. Armed with the knowledge that the Harmonic number 0 is minus gamma, that Harmonic 1 is 1 more, that Harmonic 2 is a half more again and that Harmonic 3 is a third more again, the author created a Chebyshev polynomial for the series. There is no signum in the inner core of floating point. So it is necessary to choose a range over which all results are positive, and between 1 and 2. The author chose the range 1 to 2, giving 0.422784336 to 0.922784336. Over this range, he used the first polynomial to generate a data array to be resolved into a second polynomial. That is how, without any of his notes, the author was able to resurrect the Harmonic function for the third time. These are not the coefficients that were obtained by differentiating a pseudo-pure-math series. Nor was there time to optimise them, as all the author`s stolen work had been. To use the program, plug in the fractional part of the argument. Then, for values below 1, SUBTRACT the reciprocal of where you are repeatedly, until you reach the argument. For example, if you want H-3.4, you find H1.6 by plugging in 0.6. Then subtract 1/1.6, 1/0.6, 1/-0.4, 1/-1.4 and 1/1/-2.4. If you want the Harmonic function of a number above 2, plug in the fractional part of the number and ADD the reciprocals going upwards. For example, if you want H3.4, you find H1.4 by plugging in 0.4. Then add 1/2.4 and 1/3.4. If commercialised, the least sign of courtesy is to call it "Wehner`s Harmonic Function".
|
|
Input/Output The read-in and display routines were REAL-TIME, just like the functions given above. That means there was no buffer. Data is processed and displayed as it arrives.The simple display routine shown will only deliver numbers of the form "0.abcdefgh" where a to h are valid decimal numbers. Thus, even when the number is actually twice that displayed, we are stuck with fractional numbers. An example is the square root previously given. The square root of 2 is 1.4142136 but will be presented as 0.70710678. Accordingly, it becomes necessary to pre-divide by ten for each place ahead of the decimal point.
One tenth in binary is 0.0001100 11001100 11001100 ..... The accuracy grows exponentially. One 256th added on gives 1.100110011001100 times the number, and one 65536th gives the full 32-bit accuracy.
The display routine for the mathematics that fell into
the hands of the British government worked in a manner equivalent
to the following: (argument in EAX)
Numbers less than unity would have to be pre-multiplied by ten:
I sneaked in the instruction INC EDX or DEC EDX to show that the binary exponent in C is being changed to a decimal one in D. Based on the contents of EDX, a decision is made whether to use scientific notation 1.2345678E39 or 1.2345678E-39, or straight fractional notation 12.345678 or 0.00012345678. In a 32-bit system, after the 21 was added on, eight places can be printed out with rounding. It is not usual to round the numbers for any purpose other than prior to display. These then were the principles upon which the author`s various mathematical systems were based.
|
We have seen how to teach higher mathematics to a silicon imbecile,
to a CPU. The transcendental function programs are the transcendental
meditations of a rock encased in plastic.
(C) 1983-2004 Charles Douglas Wehner. |