Welcome to CML’s documentation!¶
Introduction¶
The C Math Library (CML) is a collection of routines for numerical computing. The routines have been written from scratch in C, and present a modern Applications Programming Interface (API) for C programmers, allowing wrappers to be written for very high level languages. The source code is distributed under the MIT License.
Routines available in CML¶
The library covers a wide range of topics in numerical computing. Routines are available for the following areas,
Complex Numbers | Special Functions | Vectors and Matrices |
Quaternions | Differential Equations | Numerical Differentiation |
IEEE Floating-Point | Physical Constants | Easing Functions |
The use of these routines is described in this manual. Each chapter provides detailed definitions of the functions, followed by example programs and references to the articles on which the algorithms are based.
Conventions used in this manual¶
This manual contains many examples which can be typed at the keyboard. A command entered at the terminal is shown like this:
$ command
The first character on the line is the terminal prompt, and should not be typed. The dollar sign $ is used as the standard prompt in this manual, although some systems may use a different character.
The examples assume the use of the GNU operating system. There may be
minor differences in the output on other systems. The commands for
setting environment variables use the Bourne shell syntax of the
standard GNU shell (bash
).
Using the Library¶
This chapter describes how to compile programs that use CML, and introduces its conventions.
An Example Program¶
The following short program demonstrates the use of the library
#include <stdlib.h>
#include <stdio.h>
#include <cml.h>
int
main(int argc, char const *argv[])
{
cml_complex_t z, w;
z = complex(1.0, 2.0);
w = csin(z);
printf("%g\n", sin(2.0));
printf("%g\n", atan2(2.0, 3.0));
printf("%g\n", creal(w));
printf("%g\n", cimag(w));
return 0;
}
The steps needed to compile this program are described in the following sections.
Compiling and Linking¶
The library header files are installed in their own cml
directory. You should write any preprocessor include statements with a
cml/
directory prefix thus:
#include <cml/math.h>
or simply requiring all the modules in the following way:
#include <cml.h>
If the directory is not installed on the standard search path of your
compiler you will also need to provide its location to the preprocessor
as a command line flag. The default location of the main header file
cml.h
and the cml
directory is /usr/local/include
.
A typical compilation command for a source file example.c
with
the GNU C compiler gcc
is:
$ gcc -Wall -I/usr/local/include -c example.c
This results in an object file example.o
. The default
include path for gcc
searches /usr/local/include
automatically so
the -I
option can actually be omitted when CML is installed
in its default location.
Linking programs with the library¶
The library is installed as a single file, libcml.a
. A shared
version of the library libcml.so
is also installed on systems
that support shared libraries. The default location of these files is
/usr/local/lib
. If this directory is not on the standard search
path of your linker you will also need to provide its location as a
command line flag. The following example shows how to link an application
with the library:
$ gcc -L/usr/local/lib example.o -lcml
The default library path for gcc
searches /usr/local/lib
automatically so the -L
option can be omitted when CML is
installed in its default location.
For a tutorial introduction to the GNU C Compiler and related programs, see “An Introduction to GCC” (ISBN 0954161793). [1]
ANSI C Compliance¶
The library is written in ANSI C and is intended to conform to the ANSI C standard (C89). It should be portable to any system with a working ANSI C compiler.
The library does not rely on any non-ANSI extensions in the interface it exports to the user. Programs you write using CML can be ANSI compliant. Extensions which can be used in a way compatible with pure ANSI C are supported, however, via conditional compilation. This allows the library to take advantage of compiler extensions on those platforms which support them.
When an ANSI C feature is known to be broken on a particular system the library will exclude any related functions at compile-time. This should make it impossible to link a program that would use these functions and give incorrect results.
To avoid namespace conflicts all exported function names and variables
have the prefix cml_
, while exported macros have the prefix
CML_
.
Inline functions¶
The inline
keyword is not part of the original ANSI C standard (C89)
so the library does not export any inline function definitions by default.
Inline functions were introduced officially in the newer C99 standard but
most C89 compilers have also included inline
as an extension for a
long time.
To allow the use of inline functions, the library provides optional inline versions of performance-critical routines by conditional compilation in the exported header files.
By default, the actual form of the inline keyword is extern inline
,
which is a gcc
extension that eliminates unnecessary function
definitions.
When compiling with gcc in C99 mode (gcc -std=c99) the header files automatically switch to C99-compatible inline function declarations instead of extern inline.
Long double¶
In general, the algorithms in the library are written for double
precision only. The long double
type is not supported for
every computation.
One reason for this choice is that the precision of long double
is platform dependent. The IEEE standard only specifies the minimum
precision of extended precision numbers, while the precision of
double
is the same on all platforms.
However, it is sometimes necessary to interact with external data in long-double format, so the structures datatypes include long-double versions.
It should be noted that in some system libraries the stdio.h
formatted input/output functions printf
and scanf
are
not implemented correctly for long double
. Undefined or
incorrect results are avoided by testing these functions during the
configure
stage of library compilation and eliminating certain
CML functions which depend on them if necessary. The corresponding
line in the configure
output looks like this:
checking whether printf works with long double... no
Consequently when long double
formatted input/output does not
work on a given system it should be impossible to link a program which
uses CML functions dependent on this.
If it is necessary to work on a system which does not support formatted
long double
input/output then the options are to use binary
formats or to convert long double
results into double
for
reading and writing.
Compatibility with C++¶
The library header files automatically define functions to have
extern "C"
linkage when included in C++ programs. This allows
the functions to be called directly from C++.
Thread-safety¶
The library can be used in multi-threaded programs. All the functions are thread-safe, in the sense that they do not use static variables. Memory is always associated with objects and not with functions. For functions which use workspace objects as temporary storage the workspaces should be allocated on a per-thread basis. For functions which use table objects as read-only memory the tables can be used by multiple threads simultaneously.
Footnotes
[1] | http://www.network-theory.co.uk/gcc/intro/ |
[2] | /etc/ld.so.conf on GNU/Linux systems |
Error Handling¶
This chapter describes the way that CML functions report and handle errors. By examining the status information returned by every function you can determine whether it succeeded or failed, and if it failed you can find out what the precise cause of failure was. You can also define your own error handling functions to modify the default behavior of the library.
The functions described in this chapter are declared in the header
file cml/errno.h
.
Error Reporting¶
The library follows the thread-safe error reporting conventions of the
POSIX Threads library. Functions return a non-zero error code to
indicate an error and 0
to indicate success:
int status = cml_function (...)
if (status) { /* an error occurred */
.....
/* status value specifies the type of error */
}
The routines report an error whenever they cannot perform the task requested of them. For example, a root-finding function would return a non-zero error code if could not converge to the requested accuracy, or exceeded a limit on the number of iterations. Situations like this are a normal occurrence when using any mathematical library and you should check the return status of the functions that you call.
Whenever a routine reports an error the return value specifies the type
of error. The return value is analogous to the value of the variable
errno
in the C library. The caller can examine the return code
and decide what action to take, including ignoring the error if it is
not considered serious.
In addition to reporting errors by return codes the library also has an
error handler function cml_error()
. This function is called by
other library functions when they report an error, just before they
return to the caller. The default behavior of the error handler is to
print a message and abort the program:
cml: file.c:67: ERROR: invalid argument supplied by user
Default CML error handler invoked.
Aborted
The purpose of the cml_error()
handler is to provide a function
where a breakpoint can be set that will catch library errors when
running under the debugger. It is not intended for use in production
programs, which should handle any errors using the return codes.
Error Codes¶
The error code numbers returned by library functions are defined in
the file cml/errno.h
. They all have the prefix CML_
and
expand to non-zero constant integer values. Error codes above 1024 are
reserved for applications, and are not used by the library. Many of
the error codes use the same base name as the corresponding error code
in the C library. Here are some of the most common error codes,
-
int
CML_EDOM
¶ Domain error; used by mathematical functions when an argument value does not fall into the domain over which the function is defined (like
EDOM
in the C library)
-
int
CML_ERANGE
¶ Range error; used by mathematical functions when the result value is not representable because of overflow or underflow (like
ERANGE
in the C library)
-
int
CML_ENOMEM
¶ No memory available. The system cannot allocate more virtual memory because its capacity is full (like
ENOMEM
in the C library). This error is reported when a CML routine encounters problems when trying to allocate memory withmalloc()
.
-
int
CML_EINVAL
¶ Invalid argument. This is used to indicate various kinds of problems with passing the wrong argument to a library function (like
EINVAL
in the C library).
The error codes can be converted into an error message using the
function cml_strerror()
.
-
const char *
cml_strerror
(const int cml_errno)¶ This function returns a pointer to a string describing the error code
cml_errno
. For example:printf ("error: %s\n", cml_strerror (status));
would print an error message like
error: output range error
for a status value ofCML_ERANGE
.
Error Handlers¶
The default behavior of the CML error handler is to print a short
message and call abort()
. When this default is in use programs
will stop with a core-dump whenever a library routine reports an error.
This is intended as a fail-safe default for programs which do not check
the return status of library routines (we don’t encourage you to write
programs this way).
If you turn off the default error handler it is your responsibility to check the return values of routines and handle them yourself. You can also customize the error behavior by providing a new error handler. For example, an alternative error handler could log all errors to a file, ignore certain error conditions (such as underflows), or start the debugger and attach it to the current process when an error occurs.
All CML error handlers have the type cml_error_handler_t
, which is
defined in cml_errno.h
,
-
cml_error_handler_t
¶ This is the type of CML error handler functions. An error handler will be passed four arguments which specify the reason for the error (a string), the name of the source file in which it occurred (also a string), the line number in that file (an integer) and the error number (an integer). The source file and line number are set at compile time using the
__FILE__
and__LINE__
directives in the preprocessor. An error handler function returns typevoid
. Error handler functions should be defined like this:void handler (const char * reason, const char * file, int line, int cml_errno)
To request the use of your own error handler you need to call the
function cml_set_error_handler()
which is also declared in
cml_errno.h
,
-
cml_error_handler_t *
cml_set_error_handler
(cml_error_handler_t * new_handler)¶ This function sets a new error handler,
new_handler
, for the CML library routines. The previous handler is returned (so that you can restore it later). Note that the pointer to a user defined error handler function is stored in a static variable, so there can be only one error handler per program. This function should be not be used in multi-threaded programs except to set up a program-wide error handler from a master thread. The following example shows how to set and restore a new error handler:/* save original handler, install new handler */ old_handler = cml_set_error_handler (&my_handler); /* code uses new handler */ ..... /* restore original handler */ cml_set_error_handler (old_handler);
To use the default behavior (
abort()
on error) set the error handler toNULL
:old_handler = cml_set_error_handler (NULL);
-
cml_error_handler_t *
cml_set_error_handler_off
()¶ This function turns off the error handler by defining an error handler which does nothing. This will cause the program to continue after any error, so the return values from any library routines must be checked. This is the recommended behavior for production programs. The previous handler is returned (so that you can restore it later).
The error behavior can be changed for specific applications by
recompiling the library with a customized definition of the
CML_ERROR
macro in the file cml_errno.h
.
Using CML error reporting in your own functions¶
If you are writing numerical functions in a program which also uses CML code you may find it convenient to adopt the same error reporting conventions as in the library.
To report an error you need to call the function cml_error()
with a
string describing the error and then return an appropriate error code
from cml_errno.h
, or a special value, such as NaN
. For
convenience the file cml_errno.h
defines two macros which carry
out these steps:
-
CML_ERROR
(reason, cml_errno)¶ This macro reports an error using the CML conventions and returns a status value of
cml_errno
. It expands to the following code fragment:cml_error (reason, __FILE__, __LINE__, cml_errno); return cml_errno;
The macro definition in
cml_errno.h
actually wraps the code in ado { ... } while (0)
block to prevent possible parsing problems.
Here is an example of how the macro could be used to report that a
routine did not achieve a requested tolerance. To report the error the
routine needs to return the error code CML_ETOL
:
if (residual > tolerance)
{
CML_ERROR("residual exceeds tolerance", CML_ETOL);
}
-
CML_ERROR_VAL
(reason, cml_errno, value)¶ This macro is the same as
CML_ERROR
but returns a user-defined value ofvalue
instead of an error code. It can be used for mathematical functions that return a floating point value.
The following example shows how to return a NaN
at a mathematical
singularity using the CML_ERROR_VAL
macro:
if (x == 0)
{
CML_ERROR_VAL("argument lies on singularity", CML_ERANGE, CML_NAN);
}
Examples¶
Here is an example of some code which checks the return value of a function where an error might be reported:
#include <stdio.h>
#include <cml/cml_errno.h>
#include <cml/cml_fft_complex.h>
...
int status;
size_t n = 37;
cml_set_error_handler_off();
status = cml_fft_cml_complex_radix2_forward (data, stride, n);
if (status) {
if (status == CML_EINVAL) {
fprintf (stderr, "invalid argument, n=%d\n", n);
} else {
fprintf (stderr, "failed, cml_errno=%d\n", status);
}
exit (-1);
}
...
The function cml_fft_cml_complex_radix2_forward()
only accepts integer lengths
which are a power of two. If the variable n
is not a power of
two then the call to the library function will return CML_EINVAL
,
indicating that the length argument is invalid. The function call to
cml_set_error_handler_off()
stops the default error handler from
aborting the program. The else
clause catches any other possible
errors.
Mathematical Functions¶
This chapter describes basic mathematical functions. Some of these functions are present in system libraries, but the alternative versions given here can be used as a substitute when the system functions are not available.
The functions and macros described in this chapter are defined in the
header file cml/math.h
.
Mathematical Constants¶
The library ensures that the standard BSD mathematical constants are defined. For reference, here is a list of the constants:
M_E |
The base of exponentials, ![]() |
M_LOG2E |
The base-2 logarithm of ![]() ![]() |
M_LOG10E |
The base-10 logarithm of ![]() ![]() |
M_SQRT2 |
The square root of two, ![]() |
M_SQRT1_2 |
The square root of one-half, ![]() |
M_SQRT3 |
The square root of three, ![]() |
M_PI |
The constant pi, ![]() |
M_PI_2 |
Pi divided by two, ![]() |
M_PI_4 |
Pi divided by four, ![]() |
M_SQRTPI |
The square root of pi, ![]() |
M_2_SQRTPI |
Two divided by the square root of pi, ![]() |
M_1_PI |
The reciprocal of pi, ![]() |
M_2_PI |
Twice the reciprocal of pi, ![]() |
M_LN10 |
The natural logarithm of ten, ![]() |
M_LN2 |
The natural logarithm of two, ![]() |
M_LNPI |
The natural logarithm of pi, ![]() |
M_EULER |
Euler’s constant, ![]() |
Infinities and Not-a-number¶
-
POSINF
¶ This macro contains the IEEE representation of positive infinity,
. It is computed from the expression
+1.0/0.0
.
-
NEGINF
¶ This macro contains the IEEE representation of negative infinity,
. It is computed from the expression
-1.0/0.0
.
-
NAN
¶ This macro contains the IEEE representation of the Not-a-Number symbol,
NaN
. It is computed from the ratio0.0/0.0
.
-
bool
cml_isnan
(double x)¶ This function returns 1 if
x
is not-a-number.
-
bool
cml_isinf
(double x)¶ This function returns
if
x
is positive infinity,if
x
is negative infinity and 0 otherwise. [1]
-
bool
cml_isfinite
(double x)¶ This function returns 1 if
x
is a real number, and 0 if it is infinite or not-a-number.
Elementary Functions¶
The following routines provide portable implementations of functions
found in the BSD math library. When native versions are not available
the functions described here can be used instead. The substitution can
be made automatically if you use autoconf
to compile your
application (see portability-functions).
-
double
cml_log1p
(double x)¶ This function computes the value of
in a way that is accurate for small
x
. It provides an alternative to the BSD math functionlog1p(x)
.
-
double
cml_expm1
(double x)¶ This function computes the value of
in a way that is accurate for small
x
. It provides an alternative to the BSD math functionexpm1(x)
.
-
double
cml_hypot
(double x, double y)¶ This function computes the value of
in a way that avoids overflow. It provides an alternative to the BSD math function
hypot(x,y)
.
-
double
cml_hypot3
(double x, double y, double cml_z)¶ This function computes the value of
in a way that avoids overflow.
-
double
cml_acosh
(double x)¶ This function computes the value of
. It provides an alternative to the standard math function
acosh(x)
.
-
double
cml_asinh
(double x)¶ This function computes the value of
. It provides an alternative to the standard math function
asinh(x)
.
-
double
cml_atanh
(double x)¶ This function computes the value of
. It provides an alternative to the standard math function
atanh(x)
.
-
double
cml_ldexp
(double x, int e)¶ This function computes the value of
. It provides an alternative to the standard math function
ldexp(x,e)
.
-
double
cml_frexp
(double x, int *e)¶ This function splits the number
x
into its normalized fractionand exponent
, such that
and
. The function returns
and stores the exponent in
. If
is zero, both
and
are set to zero. This function provides an alternative to the standard math function
frexp(x, e)
.
Small integer powers¶
A common complaint about the standard C library is its lack of a function for calculating (small) integer powers. CML provides some simple functions to fill this gap. For reasons of efficiency, these functions do not check for overflow or underflow conditions.
-
double
cml_pow_int
(double x, int n)¶ -
double
cml_pow_uint
(double x, unsigned int n)¶ These routines computes the power
for integer
n
. The power is computed efficiently—for example,is computed as
, requiring only 3 multiplications.
-
double
cml_pow_2
(double x)¶ -
double
cml_pow_3
(double x)¶ -
double
cml_pow_4
(double x)¶ -
double
cml_pow_5
(double x)¶ -
double
cml_pow_6
(double x)¶ -
double
cml_pow_7
(double x)¶ -
double
cml_pow_8
(double x)¶ -
double
cml_pow_9
(double x)¶ These functions can be used to compute small integer powers
,
, etc. efficiently. The functions will be inlined when
HAVE_INLINE
is defined, so that use of these functions should be as efficient as explicitly writing the corresponding product expression:#include <cml.h> [...] double y = pow_4(3.141); /* compute 3.141**4 */
Testing the Sign of Numbers¶
-
double
cml_sgn
(double x)¶ This macro returns the sign of
x
. It is defined as((x) >= 0 ? 1 : -1)
. Note that with this definition the sign of zero is positive (regardless of its IEEE sign bit).
Maximum and Minimum functions¶
Note that the following macros perform multiple evaluations of their arguments, so they should not be used with arguments that have side effects (such as a call to a random number generator).
-
CML_MAX
(a, b)¶ This macro returns the maximum of
a
andb
. It is defined as((a) > (b) ? (a):(b))
.
-
CML_MIN
(a, b)¶ This macro returns the minimum of
a
andb
. It is defined as((a) < (b) ? (a):(b))
.
Approximate Comparison of Floating Point Numbers¶
It is sometimes useful to be able to compare two floating point numbers approximately, to allow for rounding and truncation errors. The following function implements the approximate floating-point comparison algorithm proposed by D.E. Knuth in Section 4.2.2 of “Seminumerical Algorithms” (3rd edition).
-
bool
cml_cmp
(double x, double y, double epsilon)¶ This function determines whether
x
andy
are approximately equal to a relative accuracyepsilon
.The relative accuracy is measured using an interval of size
, where
and
is the maximum base-2 exponent of
and
as computed by the function
frexp()
.If
and
lie within this interval, they are considered approximately equal and the function returns 0. Otherwise if
, the function returns
, or if
, the function returns
.
Note that
and
are compared to relative accuracy, so this function is not suitable for testing whether a value is approximately zero.
The implementation is based on the package
fcmp
by T.C. Belding.
Footnotes
[1] | Note that the C99 standard only requires the
system isinf() function to return a non-zero value, without the
sign of the infinity. The implementation in some earlier versions of
CML used the system isinf() function and may have this behavior
on some platforms. Therefore, it is advisable to test the sign of
x separately, if needed, rather than relying the sign of the
return value from isinf() . |
Complex Numbers¶
The functions described in this chapter provide support for complex numbers. The algorithms take care to avoid unnecessary intermediate underflows and overflows, allowing the functions to be evaluated over as much of the complex plane as possible.
The complex types, functions and arithmetic operations are defined in
the header file cml/complex.h
.
Representation of complex numbers¶
Complex numbers are represented using the type cml_complex_t
. The
internal representation of this type may vary across platforms and
should not be accessed directly. The functions and macros described
below allow complex numbers to be manipulated in a portable way.
For reference, the default form of the cml_complex_t
type is
given by the following struct:
typedef struct _complex
{
union
{
double p[2];
double parts[2];
struct
{
double re;
double im;
};
struct
{
double real;
double imaginary;
};
};
} cml_complex_t;
The real and imaginary part are stored in contiguous elements of a two
element array. This eliminates any padding between the real and
imaginary parts, parts[0]
and parts[1]
, allowing the struct to
be mapped correctly onto packed complex arrays.
-
cml_complex_t
complex
(double x, double y)¶ This function uses the rectangular Cartesian components
to return the complex number
. An inline version of this function is used when
HAVE_INLINE
is defined.
-
cml_complex_t
cml_complex_polar
(double r, double theta)¶ This function returns the complex number
from the polar representation (
r
,theta
).
Properties of complex numbers¶
-
double
cml_complex_arg
(cml_complex_t z)¶ This function returns the argument of the complex number
z
,, where
.
-
double
cml_complex_abs
(cml_complex_t z)¶ This function returns the magnitude of the complex number
z
,.
-
double
cml_complex_abs2
(cml_complex_t z)¶ This function returns the squared magnitude of the complex number
z
,.
-
double
cml_complex_logabs
(cml_complex_t z)¶ This function returns the natural logarithm of the magnitude of the complex number
z
,. It allows an accurate evaluation of
when
is close to one. The direct evaluation of
log(cml_complex_abs(z))
would lead to a loss of precision in this case.
Complex arithmetic operators¶
-
cml_complex_t
cml_complex_add
(cml_complex_t a, cml_complex_t b)¶ This function returns the sum of the complex numbers
a
andb
,.
-
cml_complex_t
cml_complex_sub
(cml_complex_t a, cml_complex_t b)¶ This function returns the difference of the complex numbers
a
andb
,.
-
cml_complex_t
cml_complex_mul
(cml_complex_t a, cml_complex_t b)¶ This function returns the product of the complex numbers
a
andb
,.
-
cml_complex_t
cml_complex_div
(cml_complex_t a, cml_complex_t b)¶ This function returns the quotient of the complex numbers
a
andb
,.
-
cml_complex_t
cml_complex_add_real
(cml_complex_t a, double x)¶ This function returns the sum of the complex number
a
and the real numberx
,.
-
cml_complex_t
cml_complex_sub_real
(cml_complex_t a, double x)¶ This function returns the difference of the complex number
a
and the real numberx
,.
-
cml_complex_t
cml_complex_mul_real
(cml_complex_t a, double x)¶ This function returns the product of the complex number
a
and the real numberx
,.
-
cml_complex_t
cml_complex_div_real
(cml_complex_t a, double x)¶ This function returns the quotient of the complex number
a
and the real numberx
,.
-
cml_complex_t
cml_complex_add_imag
(cml_complex_t a, double y)¶ This function returns the sum of the complex number
a
and the imaginary number,
.
-
cml_complex_t
cml_complex_sub_imag
(cml_complex_t a, double y)¶ This function returns the difference of the complex number
a
and the imaginary number,
.
-
cml_complex_t
cml_complex_mul_imag
(cml_complex_t a, double y)¶ This function returns the product of the complex number
a
and the imaginary number,
.
-
cml_complex_t
cml_complex_div_imag
(cml_complex_t a, double y)¶ This function returns the quotient of the complex number
a
and the imaginary number,
.
-
cml_complex_t
cml_complex_conj
(cml_complex_t z)¶ This function returns the complex conjugate of the complex number
z
,.
-
cml_complex_t
cml_complex_inverse
(cml_complex_t z)¶ This function returns the inverse, or reciprocal, of the complex number
z
,.
-
cml_complex_t
cml_complex_negative
(cml_complex_t z)¶ This function returns the negative of the complex number
z
,.
Elementary Complex Functions¶
-
cml_complex_t
cml_complex_sqrt
(cml_complex_t z)¶ This function returns the square root of the complex number
z
,. The branch cut is the negative real axis. The result always lies in the right half of the complex plane.
-
cml_complex_t
cml_complex_sqrt_real
(double x)¶ This function returns the complex square root of the real number
x
, wherex
may be negative.
-
cml_complex_t
cml_complex_pow
(cml_complex_t z, cml_complex_t a)¶ The function returns the complex number
z
raised to the complex powera
,. This is computed as
using complex logarithms and complex exponentials.
-
cml_complex_t
cml_complex_pow_real
(cml_complex_t z, double x)¶ This function returns the complex number
z
raised to the real powerx
,.
-
cml_complex_t
cml_complex_exp
(cml_complex_t z)¶ This function returns the complex exponential of the complex number
z
,.
-
cml_complex_t
cml_complex_log
(cml_complex_t z)¶ This function returns the complex natural logarithm (base
) of the complex number
z
,. The branch cut is the negative real axis.
-
cml_complex_t
cml_complex_log10
(cml_complex_t z)¶ This function returns the complex base-10 logarithm of the complex number
z
,.
-
cml_complex_t
cml_complex_log_b
(cml_complex_t z, cml_complex_t b)¶ This function returns the complex base-
b
logarithm of the complex numberz
,. This quantity is computed as the ratio
.
Complex Trigonometric Functions¶
-
cml_complex_t
cml_complex_sin
(cml_complex_t z)¶ This function returns the complex sine of the complex number
z
,.
-
cml_complex_t
cml_complex_cos
(cml_complex_t z)¶ This function returns the complex cosine of the complex number
z
,.
-
cml_complex_t
cml_complex_tan
(cml_complex_t z)¶ This function returns the complex tangent of the complex number
z
,.
-
cml_complex_t
cml_complex_sec
(cml_complex_t z)¶ This function returns the complex secant of the complex number
z
,.
-
cml_complex_t
cml_complex_csc
(cml_complex_t z)¶ This function returns the complex cosecant of the complex number
z
,.
-
cml_complex_t
cml_complex_cot
(cml_complex_t z)¶ This function returns the complex cotangent of the complex number
z
,.
Inverse Complex Trigonometric Functions¶
-
cml_complex_t
cml_complex_asin
(cml_complex_t z)¶ This function returns the complex arcsine of the complex number
z
,. The branch cuts are on the real axis, less than
and greater than
.
-
cml_complex_t
cml_complex_asin_real
(double z)¶ This function returns the complex arcsine of the real number
z
,. For
between
and
, the function returns a real value in the range
. For
less than
the result has a real part of
and a positive imaginary part. For
greater than
the result has a real part of
and a negative imaginary part.
-
cml_complex_t
cml_complex_acos
(cml_complex_t z)¶ This function returns the complex arccosine of the complex number
z
,. The branch cuts are on the real axis, less than
and greater than
.
-
cml_complex_t
cml_complex_acos_real
(double z)¶ This function returns the complex arccosine of the real number
z
,. For
between
and
, the function returns a real value in the range
. For
less than
the result has a real part of
and a negative imaginary part. For
greater than
the result is purely imaginary and positive.
-
cml_complex_t
cml_complex_atan
(cml_complex_t z)¶ This function returns the complex arctangent of the complex number
z
,. The branch cuts are on the imaginary axis, below
and above
.
-
cml_complex_t
cml_complex_asec
(cml_complex_t z)¶ This function returns the complex arcsecant of the complex number
z
,.
-
cml_complex_t
cml_complex_asec_real
(double z)¶ This function returns the complex arcsecant of the real number
z
,.
-
cml_complex_t
cml_complex_acsc
(cml_complex_t z)¶ This function returns the complex arccosecant of the complex number
z
,.
-
cml_complex_t
cml_complex_acsc_real
(double z)¶ This function returns the complex arccosecant of the real number
z
,.
-
cml_complex_t
cml_complex_acot
(cml_complex_t z)¶ This function returns the complex arccotangent of the complex number
z
,.
Complex Hyperbolic Functions¶
-
cml_complex_t
cml_complex_sinh
(cml_complex_t z)¶ This function returns the complex hyperbolic sine of the complex number
z
,.
-
cml_complex_t
cml_complex_cosh
(cml_complex_t z)¶ This function returns the complex hyperbolic cosine of the complex number
z
,.
-
cml_complex_t
cml_complex_tanh
(cml_complex_t z)¶ This function returns the complex hyperbolic tangent of the complex number
z
,.
-
cml_complex_t
cml_complex_sech
(cml_complex_t z)¶ This function returns the complex hyperbolic secant of the complex number
z
,.
-
cml_complex_t
cml_complex_csch
(cml_complex_t z)¶ This function returns the complex hyperbolic cosecant of the complex number
z
,.
-
cml_complex_t
cml_complex_coth
(cml_complex_t z)¶ This function returns the complex hyperbolic cotangent of the complex number
z
,.
Inverse Complex Hyperbolic Functions¶
-
cml_complex_t
cml_complex_asinh
(cml_complex_t z)¶ This function returns the complex hyperbolic arcsine of the complex number
z
,. The branch cuts are on the imaginary axis, below
and above
.
-
cml_complex_t
cml_complex_acosh
(cml_complex_t z)¶ This function returns the complex hyperbolic arccosine of the complex number
z
,. The branch cut is on the real axis, less than
. Note that in this case we use the negative square root in formula 4.6.21 of Abramowitz & Stegun giving
.
-
cml_complex_t
cml_complex_acosh_real
(double z)¶ This function returns the complex hyperbolic arccosine of the real number
z
,.
-
cml_complex_t
cml_complex_atanh
(cml_complex_t z)¶ This function returns the complex hyperbolic arctangent of the complex number
z
,. The branch cuts are on the real axis, less than
and greater than
.
-
cml_complex_t
cml_complex_atanh_real
(double z)¶ This function returns the complex hyperbolic arctangent of the real number
z
,.
-
cml_complex_t
cml_complex_asech
(cml_complex_t z)¶ This function returns the complex hyperbolic arcsecant of the complex number
z
,.
-
cml_complex_t
cml_complex_acsch
(cml_complex_t z)¶ This function returns the complex hyperbolic arccosecant of the complex number
z
,.
-
cml_complex_t
cml_complex_acoth
(cml_complex_t z)¶ This function returns the complex hyperbolic arccotangent of the complex number
z
,.
References and Further Reading¶
The implementations of the elementary and trigonometric functions are based on the following papers,
- T. E. Hull, Thomas F. Fairgrieve, Ping Tak Peter Tang, “Implementing Complex Elementary Functions Using Exception Handling”, ACM Transactions on Mathematical Software, Volume 20 (1994), pp 215–244, Corrigenda, p553
- T. E. Hull, Thomas F. Fairgrieve, Ping Tak Peter Tang, “Implementing the complex arcsin and arccosine functions using exception handling”, ACM Transactions on Mathematical Software, Volume 23 (1997) pp 299–335
The general formulas and details of branch cuts can be found in the following books,
- Abramowitz and Stegun, Handbook of Mathematical Functions, “Circular Functions in Terms of Real and Imaginary Parts”, Formulas 4.3.55–58, “Inverse Circular Functions in Terms of Real and Imaginary Parts”, Formulas 4.4.37–39, “Hyperbolic Functions in Terms of Real and Imaginary Parts”, Formulas 4.5.49–52, “Inverse Hyperbolic Functions—relation to Inverse Circular Functions”, Formulas 4.6.14–19.
- Dave Gillespie, Calc Manual, Free Software Foundation, ISBN 1-882114-18-3
Numerical Differentiation¶
The functions described in this chapter compute numerical derivatives by finite differencing. An adaptive algorithm is used to find the best choice of finite difference and to estimate the error in the derivative.
The functions described in this chapter are declared in the header
file cml/deriv.h
.
Functions¶
-
int
cml_deriv_central
(const cml_function_t *f, double x, double h, double *result, double *abserr)¶ This function computes the numerical derivative of the function
f
at the pointx
using an adaptive central difference algorithm with a step-size ofh
. The derivative is returned inresult
and an estimate of its absolute error is returned inabserr
.The initial value of
h
is used to estimate an optimal step-size, based on the scaling of the truncation error and round-off error in the derivative calculation. The derivative is computed using a 5-point rule for equally spaced abscissae at,
,
,
,
, with an error estimate taken from the difference between the 5-point rule and the corresponding 3-point rule
,
,
. Note that the value of the function at
does not contribute to the derivative calculation, so only 4-points are actually used.
-
int
cml_deriv_forward
(const cml_function_t *f, double x, double h, double *result, double *abserr)¶ This function computes the numerical derivative of the function
f
at the pointx
using an adaptive forward difference algorithm with a step-size ofh
. The function is evaluated only at points greater thanx
, and never atx
itself. The derivative is returned inresult
and an estimate of its absolute error is returned inabserr
. This function should be used ifhas a discontinuity at
x
, or is undefined for values less thanx
.The initial value of
h
is used to estimate an optimal step-size, based on the scaling of the truncation error and round-off error in the derivative calculation. The derivative atis computed using an “open” 4-point rule for equally spaced abscissae at
,
,
,
, with an error estimate taken from the difference between the 4-point rule and the corresponding 2-point rule
,
.
-
int
cml_deriv_backward
(const cml_function_t *f, double x, double h, double *result, double *abserr)¶ This function computes the numerical derivative of the function
f
at the pointx
using an adaptive backward difference algorithm with a step-size ofh
. The function is evaluated only at points less thanx
, and never atx
itself. The derivative is returned inresult
and an estimate of its absolute error is returned inabserr
. This function should be used ifhas a discontinuity at
x
, or is undefined for values greater thanx
.This function is equivalent to calling
cml_deriv_forward()
with a negative step-size.
Examples¶
The following code estimates the derivative of the function
at
and at
. The function
is
undefined for
so the derivative at
is computed
using
cml_deriv_forward()
.
#include <stdio.h>
#include <cml.h>
double
f(double x, void *params)
{
(void)(params); /* avoid unused parameter warning */
return pow(x, 1.5);
}
int
main(void)
{
cml_function_t F;
double result, abserr;
F.function = &f;
F.params = 0;
printf("f(x) = x^(3/2)\n");
cml_deriv_central(&F, 2.0, 1e-8, &result, &abserr);
printf("x = 2.0\n");
printf("f'(x) = %.10f +/- %.10f\n", result, abserr);
printf("exact = %.10f\n\n", 1.5 * sqrt(2.0));
cml_deriv_forward (&F, 0.0, 1e-8, &result, &abserr);
printf("x = 0.0\n");
printf("f'(x) = %.10f +/- %.10f\n", result, abserr);
printf("exact = %.10f\n", 0.0);
return 0;
}
Here is the output of the program,
f(x) = x^(3/2)
x = 2.0
f'(x) = 2.1213203120 +/- 0.0000005006
exact = 2.1213203436
x = 0.0
f'(x) = 0.0000000160 +/- 0.0000000339
exact = 0.0000000000
References and Further Reading¶
The algorithms used by these functions are described in the following sources:
- Abramowitz and Stegun, Handbook of Mathematical Functions, Section 25.3.4, and Table 25.5 (Coefficients for Differentiation).
- S.D. Conte and Carl de Boor, Elementary Numerical Analysis: An Algorithmic Approach, McGraw-Hill, 1972.
Easings Functions¶
The functions described in this chapter are declared in the header
file cml/easings.h
.
The easing functions are an implementation of the functions presented in http://easings.net/, useful particularly for animations. Easing is a method of distorting time to control apparent motion in animation. It is most commonly used for slow-in, slow-out. By easing time, animated transitions are smoother and exhibit more plausible motion.
Easing functions take a value inside the range [0.0, 1.0]
and usually will
return a value inside that same range. However, in some of the easing
functions, the returned value extrapolate that range
http://easings.net/ to see those functions).
The following types of easing functions are supported:
Linear
Quadratic
Cubic
Quartic
Quintic
Sine
Circular
Exponential
Elastic
Bounce
Back
The core easing functions are implemented as C functions that take a time parameter and return a progress parameter, which can subsequently be used to interpolate any quantity.
Physical Constants¶
This chapter describes macros for the values of physical constants, such
as the speed of light, , and gravitational constant,
.
The values are available in different unit systems, including the
standard MKSA system (meters, kilograms, seconds, amperes) and the CGSM
system (centimeters, grams, seconds, gauss), which is commonly used in
Astronomy.
The full list of constants is described briefly below. Consult the header files themselves for the values of the constants used in the library.
Fundamental Constants¶
-
CML_CONST_MKSA_SPEED_OF_LIGHT
¶ The speed of light in vacuum,
.
-
CML_CONST_MKSA_VACUUM_PERMEABILITY
¶ The permeability of free space,
. This constant is defined in the MKSA system only.
-
CML_CONST_MKSA_VACUUM_PERMITTIVITY
¶ The permittivity of free space,
. This constant is defined in the MKSA system only.
-
CML_CONST_MKSA_PLANCKS_CONSTANT_H
¶ Planck’s constant,
.
-
CML_CONST_MKSA_PLANCKS_CONSTANT_HBAR
¶ Planck’s constant divided by
,
.
-
CML_CONST_NUM_AVOGADRO
¶ Avogadro’s number,
.
-
CML_CONST_MKSA_FARADAY
¶ The molar charge of 1 Faraday.
-
CML_CONST_MKSA_BOLTZMANN
¶ The Boltzmann constant,
.
-
CML_CONST_MKSA_MOLAR_GAS
¶ The molar gas constant,
.
-
CML_CONST_MKSA_STANDARD_GAS_VOLUME
¶ The standard gas volume,
.
-
CML_CONST_MKSA_STEFAN_BOLTZMANN_CONSTANT
¶ The Stefan-Boltzmann radiation constant,
.
-
CML_CONST_MKSA_GAUSS
¶ The magnetic field of 1 Gauss.
Astronomy and Astrophysics¶
-
CML_CONST_MKSA_ASTRONOMICAL_UNIT
¶ The length of 1 astronomical unit (mean earth-sun distance),
.
-
CML_CONST_MKSA_GRAVITATIONAL_CONSTANT
¶ The gravitational constant,
.
-
CML_CONST_MKSA_LIGHT_YEAR
¶ The distance of 1 light-year,
.
-
CML_CONST_MKSA_PARSEC
¶ The distance of 1 parsec,
.
-
CML_CONST_MKSA_GRAV_ACCEL
¶ The standard gravitational acceleration on Earth,
.
-
CML_CONST_MKSA_SOLAR_MASS
¶ The mass of the Sun.
Atomic and Nuclear Physics¶
-
CML_CONST_MKSA_ELECTRON_CHARGE
¶ The charge of the electron,
.
-
CML_CONST_MKSA_ELECTRON_VOLT
¶ The energy of 1 electron volt,
.
-
CML_CONST_MKSA_UNIFIED_ATOMIC_MASS
¶ The unified atomic mass,
.
-
CML_CONST_MKSA_MASS_ELECTRON
¶ The mass of the electron,
.
-
CML_CONST_MKSA_MASS_MUON
¶ The mass of the muon,
.
-
CML_CONST_MKSA_MASS_PROTON
¶ The mass of the proton,
.
-
CML_CONST_MKSA_MASS_NEUTRON
¶ The mass of the neutron,
.
-
CML_CONST_NUM_FINE_STRUCTURE
¶ The electromagnetic fine structure constant
.
-
CML_CONST_MKSA_RYDBERG
¶ The Rydberg constant,
, in units of energy. This is related to the Rydberg inverse wavelength
by
.
-
CML_CONST_MKSA_BOHR_RADIUS
¶ The Bohr radius,
.
-
CML_CONST_MKSA_ANGSTROM
¶ The length of 1 angstrom.
-
CML_CONST_MKSA_BARN
¶ The area of 1 barn.
-
CML_CONST_MKSA_BOHR_MAGNETON
¶ The Bohr Magneton,
.
-
CML_CONST_MKSA_NUCLEAR_MAGNETON
¶ The Nuclear Magneton,
.
-
CML_CONST_MKSA_ELECTRON_MAGNETIC_MOMENT
¶ The absolute value of the magnetic moment of the electron,
. The physical magnetic moment of the electron is negative.
-
CML_CONST_MKSA_PROTON_MAGNETIC_MOMENT
¶ The magnetic moment of the proton,
.
-
CML_CONST_MKSA_THOMSON_CROSS_SECTION
¶ The Thomson cross section,
.
-
CML_CONST_MKSA_DEBYE
¶ The electric dipole moment of 1 Debye,
.
Measurement of Time¶
-
CML_CONST_MKSA_MINUTE
¶ The number of seconds in 1 minute.
-
CML_CONST_MKSA_HOUR
¶ The number of seconds in 1 hour.
-
CML_CONST_MKSA_DAY
¶ The number of seconds in 1 day.
-
CML_CONST_MKSA_WEEK
¶ The number of seconds in 1 week.
Imperial Units¶
-
CML_CONST_MKSA_INCH
¶ The length of 1 inch.
-
CML_CONST_MKSA_FOOT
¶ The length of 1 foot.
-
CML_CONST_MKSA_YARD
¶ The length of 1 yard.
-
CML_CONST_MKSA_MILE
¶ The length of 1 mile.
-
CML_CONST_MKSA_MIL
¶ The length of 1 mil (1/1000th of an inch).
Speed and Nautical Units¶
-
CML_CONST_MKSA_KILOMETERS_PER_HOUR
¶ The speed of 1 kilometer per hour.
-
CML_CONST_MKSA_MILES_PER_HOUR
¶ The speed of 1 mile per hour.
-
CML_CONST_MKSA_NAUTICAL_MILE
¶ The length of 1 nautical mile.
-
CML_CONST_MKSA_FATHOM
¶ The length of 1 fathom.
-
CML_CONST_MKSA_KNOT
¶ The speed of 1 knot.
Printers Units¶
-
CML_CONST_MKSA_POINT
¶ The length of 1 printer’s point (1/72 inch).
-
CML_CONST_MKSA_TEXPOINT
¶ The length of 1 TeX point (1/72.27 inch).
Volume, Area and Length¶
-
CML_CONST_MKSA_MICRON
¶ The length of 1 micron.
-
CML_CONST_MKSA_HECTARE
¶ The area of 1 hectare.
-
CML_CONST_MKSA_ACRE
¶ The area of 1 acre.
-
CML_CONST_MKSA_LITER
¶ The volume of 1 liter.
-
CML_CONST_MKSA_US_GALLON
¶ The volume of 1 US gallon.
-
CML_CONST_MKSA_CANADIAN_GALLON
¶ The volume of 1 Canadian gallon.
-
CML_CONST_MKSA_UK_GALLON
¶ The volume of 1 UK gallon.
-
CML_CONST_MKSA_QUART
¶ The volume of 1 quart.
-
CML_CONST_MKSA_PINT
¶ The volume of 1 pint.
Mass and Weight¶
-
CML_CONST_MKSA_POUND_MASS
¶ The mass of 1 pound.
-
CML_CONST_MKSA_OUNCE_MASS
¶ The mass of 1 ounce.
-
CML_CONST_MKSA_TON
¶ The mass of 1 ton.
-
CML_CONST_MKSA_METRIC_TON
¶ The mass of 1 metric ton (1000 kg).
-
CML_CONST_MKSA_UK_TON
¶ The mass of 1 UK ton.
-
CML_CONST_MKSA_TROY_OUNCE
¶ The mass of 1 troy ounce.
-
CML_CONST_MKSA_CARAT
¶ The mass of 1 carat.
-
CML_CONST_MKSA_GRAM_FORCE
¶ The force of 1 gram weight.
-
CML_CONST_MKSA_POUND_FORCE
¶ The force of 1 pound weight.
-
CML_CONST_MKSA_KILOPOUND_FORCE
¶ The force of 1 kilopound weight.
-
CML_CONST_MKSA_POUNDAL
¶ The force of 1 poundal.
Thermal Energy and Power¶
-
CML_CONST_MKSA_CALORIE
¶ The energy of 1 calorie.
-
CML_CONST_MKSA_BTU
¶ The energy of 1 British Thermal Unit,
.
-
CML_CONST_MKSA_THERM
¶ The energy of 1 Therm.
-
CML_CONST_MKSA_HORSEPOWER
¶ The power of 1 horsepower.
Pressure¶
-
CML_CONST_MKSA_BAR
¶ The pressure of 1 bar.
-
CML_CONST_MKSA_STD_ATMOSPHERE
¶ The pressure of 1 standard atmosphere.
-
CML_CONST_MKSA_TORR
¶ The pressure of 1 torr.
-
CML_CONST_MKSA_METER_OF_MERCURY
¶ The pressure of 1 meter of mercury.
-
CML_CONST_MKSA_INCH_OF_MERCURY
¶ The pressure of 1 inch of mercury.
-
CML_CONST_MKSA_INCH_OF_WATER
¶ The pressure of 1 inch of water.
-
CML_CONST_MKSA_PSI
¶ The pressure of 1 pound per square inch.
Viscosity¶
-
CML_CONST_MKSA_POISE
¶ The dynamic viscosity of 1 poise.
-
CML_CONST_MKSA_STOKES
¶ The kinematic viscosity of 1 stokes.
Light and Illumination¶
-
CML_CONST_MKSA_STILB
¶ The luminance of 1 stilb.
-
CML_CONST_MKSA_LUMEN
¶ The luminous flux of 1 lumen.
-
CML_CONST_MKSA_LUX
¶ The illuminance of 1 lux.
-
CML_CONST_MKSA_PHOT
¶ The illuminance of 1 phot.
-
CML_CONST_MKSA_FOOTCANDLE
¶ The illuminance of 1 footcandle.
-
CML_CONST_MKSA_LAMBERT
¶ The luminance of 1 lambert.
-
CML_CONST_MKSA_FOOTLAMBERT
¶ The luminance of 1 footlambert.
Radioactivity¶
-
CML_CONST_MKSA_CURIE
¶ The activity of 1 curie.
-
CML_CONST_MKSA_ROENTGEN
¶ The exposure of 1 roentgen.
-
CML_CONST_MKSA_RAD
¶ The absorbed dose of 1 rad.
Force and Energy¶
-
CML_CONST_MKSA_NEWTON
¶ The SI unit of force, 1 Newton.
-
CML_CONST_MKSA_DYNE
¶ The force of 1 Dyne =
Newton.
-
CML_CONST_MKSA_JOULE
¶ The SI unit of energy, 1 Joule.
-
CML_CONST_MKSA_ERG
¶ The energy 1 erg =
Joule.
Prefixes¶
These constants are dimensionless scaling factors.
-
CML_CONST_NUM_YOTTA
¶
-
CML_CONST_NUM_ZETTA
¶
-
CML_CONST_NUM_EXA
¶
-
CML_CONST_NUM_PETA
¶
-
CML_CONST_NUM_TERA
¶
-
CML_CONST_NUM_GIGA
¶
-
CML_CONST_NUM_MEGA
¶
-
CML_CONST_NUM_KILO
¶
-
CML_CONST_NUM_MILLI
¶
-
CML_CONST_NUM_MICRO
¶
-
CML_CONST_NUM_NANO
¶
-
CML_CONST_NUM_PICO
¶
-
CML_CONST_NUM_FEMTO
¶
-
CML_CONST_NUM_ATTO
¶
-
CML_CONST_NUM_ZEPTO
¶
-
CML_CONST_NUM_YOCTO
¶
Examples¶
The following program demonstrates the use of the physical constants in a calculation. In this case, the goal is to calculate the range of light-travel times from Earth to Mars.
The required data is the average distance of each planet from the Sun in astronomical units (the eccentricities and inclinations of the orbits will be neglected for the purposes of this calculation). The average radius of the orbit of Mars is 1.52 astronomical units, and for the orbit of Earth it is 1 astronomical unit (by definition). These values are combined with the MKSA values of the constants for the speed of light and the length of an astronomical unit to produce a result for the shortest and longest light-travel times in seconds. The figures are converted into minutes before being displayed.
#include <stdio.h>
#include <cml.h>
int
main(void)
{
double c = CML_CONST_MKSA_SPEED_OF_LIGHT;
double au = CML_CONST_MKSA_ASTRONOMICAL_UNIT;
double minutes = CML_CONST_MKSA_MINUTE;
/* distance stored in meters */
double r_earth = 1.00 * au;
double r_mars = 1.52 * au;
double t_min, t_max;
t_min = (r_mars - r_earth) / c;
t_max = (r_mars + r_earth) / c;
printf("light travel time from Earth to Mars:\n");
printf("minimum = %.1f minutes\n", t_min / minutes);
printf("maximum = %.1f minutes\n", t_max / minutes);
return 0;
}
Here is the output from the program,
light travel time from Earth to Mars:
minimum = 4.3 minutes
maximum = 21.0 minutes
References and Further Reading¶
The authoritative sources for physical constants are the 2006 CODATA recommended values, published in the article below. Further information on the values of physical constants is also available from the NIST website.
- P.J. Mohr, B.N. Taylor, D.B. Newell, “CODATA Recommended Values of the Fundamental Physical Constants: 2006”, Reviews of Modern Physics, 80(2), pp. 633–730 (2008).
- http://www.physics.nist.gov/cuu/Constants/index.html
- http://physics.nist.gov/Pubs/SP811/appenB9.html
IEEE floating-point arithmetic¶
This chapter describes functions for examining the representation of
floating point numbers and controlling the floating point environment of
your program. The functions described in this chapter are declared in
the header file cml/ieee.h
.
Representation of floating point numbers¶
The IEEE Standard for Binary Floating-Point Arithmetic defines binary
formats for single and double precision numbers. Each number is composed
of three parts: a sign bit (), an exponent
(
) and a fraction (
). The numerical value of the
combination
is given by the following formula,
The sign bit is either zero or one. The exponent ranges from a minimum value
to a maximum value
depending on the precision. The exponent is converted to an
unsigned number
, known as the biased exponent, for storage by adding a
bias parameter,
The sequence represents the digits of the binary
fraction
. The binary digits are stored in normalized
form, by adjusting the exponent to give a leading digit of
.
Since the leading digit is always 1 for normalized numbers it is
assumed implicitly and does not have to be stored.
Numbers smaller than
are be stored in denormalized form with a leading zero,
This allows gradual underflow down to
for
bits of precision.
A zero is encoded with the special exponent of
and infinities with the exponent of
.
The format for single precision numbers uses 32 bits divided in the following way:
seeeeeeeefffffffffffffffffffffff
s = sign bit, 1 bit
e = exponent, 8 bits (E_min=-126, E_max=127, bias=127)
f = fraction, 23 bits
The format for double precision numbers uses 64 bits divided in the following way:
seeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffffff
s = sign bit, 1 bit
e = exponent, 11 bits (E_min=-1022, E_max=1023, bias=1023)
f = fraction, 52 bits
It is often useful to be able to investigate the behavior of a calculation at the bit-level and the library provides functions for printing the IEEE representations in a human-readable form.
-
void
cml_ieee754_fprintf_float
(FILE * stream, const float * x)¶ -
void
cml_ieee754_fprintf_double
(FILE * stream, const double * x)¶ These functions output a formatted version of the IEEE floating-point number pointed to by
x
to the streamstream
. A pointer is used to pass the number indirectly, to avoid any undesired promotion fromfloat
todouble
. The output takes one of the following forms,NaN
the Not-a-Number symbolInf, -Inf
positive or negative infinity1.fffff...*2^E, -1.fffff...*2^E
a normalized floating point number0.fffff...*2^E, -0.fffff...*2^E
a denormalized floating point number0, -0
positive or negative zeroThe output can be used directly in GNU Emacs Calc mode by preceding it with
2#
to indicate binary.
-
void
cml_ieee754_printf_float
(const float * x)¶ -
void
cml_ieee754_printf_double
(const double * x)¶ These functions output a formatted version of the IEEE floating-point number pointed to by
x
to the streamstdout
.
The following program demonstrates the use of the functions by printing
the single and double precision representations of the fraction
. For comparison the representation of the value promoted from
single to double precision is also printed.
#include <stdio.h>
#include <cml.h>
int
main(void)
{
float f = 1.0/3.0;
double d = 1.0/3.0;
double fd = f; /* promote from float to double */
printf(" f = ");
cml_ieee754_printf_float(&f);
printf("\n");
printf("fd = ");
cml_ieee754_printf_double(&fd);
printf("\n");
printf(" d = ");
cml_ieee754_printf_double(&d);
printf("\n");
return 0;
}
The binary representation of is
. The
output below shows that the IEEE format normalizes this fraction to give
a leading digit of 1:
f = 1.01010101010101010101011*2^-2
fd = 1.0101010101010101010101100000000000000000000000000000*2^-2
d = 1.0101010101010101010101010101010101010101010101010101*2^-2
The output also shows that a single-precision number is promoted to double-precision by adding zeros in the binary representation.
References and Further Reading¶
The reference for the IEEE standard is,
- ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic.
A more pedagogical introduction to the standard can be found in the following paper,
- David Goldberg: What Every Computer Scientist Should Know About Floating-Point Arithmetic. ACM Computing Surveys, Vol.: 23, No.: 1 (March 1991), pages 5–48.
- Corrigendum: ACM Computing Surveys, Vol.: 23, No.: 3 (September 1991), page 413. and see also the sections by B. A. Wichmann and Charles B. Dunham in Surveyor’s Forum: “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. ACM Computing Surveys, Vol.: 24, No.: 3 (September 1992), page 319.
A detailed textbook on IEEE arithmetic and its practical use is available from SIAM Press,
- Michael L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM Press, ISBN 0898715717.