These functions are fully configurable fpmul
and fpmac
functions. The output can be considered to always have eight values because each part of the complex float is treated differently A v4cfloat
will have the loop interating over real0 - complex0 - real1 - complex1 … This capability is introduced to allow flexibility and implement operations on conjugates.
v8float fpmac_conf(v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int & cmp)
Returns the multiplication result.
Parameter | Comment |
---|---|
acc | Current accumulator value. This parameter does not exist for fpmul_conf. |
xbuf | First multiplication input buffer. |
xstart | Starting offset for all lanes of X. |
xoffs | 4 bits per lane: Additional lane-dependent offset for X. |
zbuf | Optional Second multiplication input buffer. If zbuf is not specified, xbuf is taken as the second buffer |
zstart | Starting offset for all lanes of Z. This must be a compile time constant. |
zoffs | 4 bits per lane: Additional lane-dependent offset for Z. |
ones | If true all lanes from Z are replaced with 1.0. |
abs | If true the absolute value is taken before accumulation. |
addmode | Select one of fpadd_add (all add), fpadd_sub (all sub), fpadd_mixadd or fpadd_mixsub (add-sub or sub-add pairs). This must be a compile time constant. |
addmask | 8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode). |
cmpmode | Use "fpcmp_lt" to select the minimum between accumulator and result of multiplication per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum. |
cmp | Optional 8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane). |