Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by negative numbers returns "division by zero" #770

Closed
leo-ard opened this issue Jul 23, 2024 · 10 comments
Closed

Division by negative numbers returns "division by zero" #770

leo-ard opened this issue Jul 23, 2024 · 10 comments
Labels
blocker This had better be fixed before releasing bug Something is not working

Comments

@leo-ard
Copy link

leo-ard commented Jul 23, 2024

Doing an arithmetic expansion with a division where the dividend is a negative number returns a "division by zero error" :

> ksh
$ echo $(( 10 / -10 ))
ksh:  10 / -10 : divide by zero
$ echo $(( 10 / -1 ))
ksh:  10 / -1 : divide by zero
$ echo $(( 10 / -2 ))
ksh:  10 / -2 : divide by zero

Tested with version sh (AT&T Research) 93u+ 2012-08-01 of ksh. Running on Macos M2 14.1.1 (23B81)

@McDutchie McDutchie added bug Something is not working blocker This had better be fixed before releasing labels Jul 23, 2024
@McDutchie
Copy link

Still reproducible in 93u+m, unfortunately. Thanks for the report.

Kind of amazing it took this long for someone to notice!

@McDutchie
Copy link

McDutchie commented Jul 23, 2024

So, we forked from the last "stable" (ha!) AT&T release, 93u+ 2012-08-01, inheriting this unfortunate bug.

Testing my stash of compiled historic AT&T ksh versions shows that the bug was introduced in 93q 2005-01-31 and fixed in 93v- 2013-06-28.

So, with the help of the ast-open-archive repo, we should be able to isolate and backport the fix from that beta version.

@McDutchie
Copy link

McDutchie commented Jul 23, 2024

This is likely the relevant RELEASE file entry from the 2013-06-28 commit:

13-06-04  A bug with the arithmetic expression $((2**63 / -1)) on some
         systems has been fixed.

And here is the 93v- 2013-06-28 change that fixes the bug, as a patch to 93u+m:

diff --git a/src/cmd/ksh93/sh/streval.c b/src/cmd/ksh93/sh/streval.c
index 79f75ee14..ac09e3d01 100644
--- a/src/cmd/ksh93/sh/streval.c
+++ b/src/cmd/ksh93/sh/streval.c
@@ -351,7 +351,7 @@ Sfdouble_t	arith_exec(Arith_t *ep)
 				num = sp[-1]/num;
 				type = 1;
 			}
-			else if((Sfulong_t)(num)==0)
+			else if((Sfulong_t)(num < 0 ? -num : num)==0)
 				arith_error(e_divzero,ep->expr,ep->emode);
 			else if(type==2 || tp[-1]==2)
 				num = U2F((Sfulong_t)(sp[-1]) / (Sfulong_t)(num));

So, the bug was caused by the fact that a floating-point variable with a negative value (num) was typecast to an unsigned integer type (Sfulong_t which is unsigned long long on most systems).

Converting a negative integer to an unsigned integer has implementation-defined results in C. However, as I am learning right now, casting a negative float to an unsigned integer is undefined behaviour according to the C standard. In practice, it behaves differently on Intel systems than it does on ARM systems (including Apple's M1/M2 processors). On ARM systems, the result of the undefined typecast is simply always 0, resulting in this bug.

The patch fixes it by making sure that the number is not negative before typecasting it.

Relevant reading (this article is about C++ but the undefined behaviour is the same in C):
https://embeddeduse.com/2013/08/25/casting-a-negative-float-to-an-unsigned-int/

@leo-ard
Copy link
Author

leo-ard commented Jul 24, 2024

Wow, what a sneaky bug. Thanks for sharing the debugging process ! It's crazy how little details in the C specification can cause bugs years later.

If you are interested, I found this bug working on pnut, a C to Shell compiler (yes you read this right). We had some division tests that were failing, flagging this bug in the underlying shell implementation. Among the shells we run on, Ksh is often the fastest one (and sometimes by far)

@McDutchie
Copy link

Workaround: make one of the numbers a floating point type by adding a .0 or ,0 (depending on the locale). Assuming LC_NUMERIC=C (or another locale that uses .), the following works without a patch:

$ echo $(( 10.0 / -10 ))                                                                                                           
-1
$ echo $(( 10 / -1.0 ))
-10
$ echo $(( 10 / -2.0 ))
-5

@leo-ard
Copy link
Author

leo-ard commented Jul 24, 2024

Oh nice trick. My workaround was to negate the divisor and the result when the divisor was negative :

#Lets say we want to divide x by y

if [ $y -lt 0 ] ; then 
    result=$((-(x / -y)))
else 
    result=$((x / y)) 
fi

@McDutchie
Copy link

McDutchie commented Jul 24, 2024

I introduced a similar bug with the same undefined behaviour myself in 4592859 (see there for explanation). On ARM systems:

$ typeset -ui i
$ echo $((i = -5))
0
$ echo $i
4294967291

The assignment worked fine (wrapped around as expected for an unsigned integer), but the value of the assignment was 0 and should have been the same as the variable's new value.

So lines 272, 274 and 276 have undefined behaviour for a negative n:

else if((attr & NV_UINT64)==NV_UINT64) /* long unsigned integer */
r = (uintmax_t)n;
else if((attr & NV_UINT16)==NV_UINT16) /* short unsigned integer */
r = (uint16_t)n;
else if((attr & NV_UINT32)==NV_UINT32) /* normal unsigned integer */
r = (uint32_t)n;

@McDutchie
Copy link

Patch to test for that other problem: to avoid undefined behaviour, invert the sign of a negative number before the typecast, then invert it again after. Maybe this is naive.

diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 01703e45f..6327b6865 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -269,11 +269,11 @@ static Sfdouble_t arith(const char **ptr, struct lval *lvalue, int type, Sfdoubl
 		else if((attr & NV_DOUBLE)==NV_DOUBLE)		/* normal float */
 			r = (double)n;
 		else if((attr & NV_UINT64)==NV_UINT64)		/* long unsigned integer */
-			r = (uintmax_t)n;
+			r = n < 0 ? -((uintmax_t)-n) : (uintmax_t)n;
 		else if((attr & NV_UINT16)==NV_UINT16)		/* short unsigned integer */
-			r = (uint16_t)n;
+			r = n < 0 ? -((uint16_t)-n) : (uint16_t)n;
 		else if((attr & NV_UINT32)==NV_UINT32)		/* normal unsigned integer */
-			r = (uint32_t)n;
+			r = n < 0 ? -((uint32_t)-n) : (uint32_t)n;
 		else if((attr & NV_INT64)==NV_INT64)		/* long signed integer */
 			r = (intmax_t)n;
 		else if((attr & NV_INT16)==NV_INT16)		/* short signed integer */

@McDutchie
Copy link

McDutchie commented Jul 24, 2024

Thinking through that second patch. I think it's actually fine, if not obvious. If n is negative, we invert the sign using the unary minus, do the typecast, and then do another unary minus operation -- at which point we're performing a unary minus on a positive unsigned integer value, which is well-defined behaviour in C and yields a positive wrapped-around unsigned integer value, which is fine to implicitly typecast back to r's float type.

@McDutchie
Copy link

There are more terrible arithmetic problems that are a lot harder to solve, but that's for a separate issue, see #771.

McDutchie added a commit that referenced this issue Jul 24, 2024
*** BUG 1 ***

Léonard Oest O'Leary (@leo-ard) reports:
> Doing an arithmetic expansion with a division where the dividend
> is a negative number returns a "division by zero error" :
>
>     > ksh
>     $ echo $(( 10 / -10 ))
>     ksh:  10 / -10 : divide by zero
>     $ echo $(( 10 / -1 ))
>     ksh:  10 / -1 : divide by zero
>     $ echo $(( 10 / -2 ))
>     ksh:  10 / -2 : divide by zero
>
> Tested with version sh (AT&T Research) 93u+ 2012-08-01 of ksh.
> Running on Macos M2 14.1.1 (23B81)

Turns out this bug is reproducible on all ARM systems. It was
inherited from 93u+ 2012-08-01. Testing my stash of compiled
historic AT&T ksh versions (on a Debian arm64 VM) shows that the
bug was introduced in 93q 2005-01-31 and fixed in 93v- 2013-06-28.

Analysis: All shell arithmetic is internally done with the
maximum-length float type that the system supports (Sfdouble_t).
Since comparison with floats is inexact, ksh converts the numnber
to Sfulong_t, the largest unsigned integer type supported by the
system, before comparing it to zero and throwing a "divide by
zero". In streval.c:

354:	else if((Sfulong_t)(num)==0)
355:		arith_error(e_divzero,ep->expr,ep->emode);

However, converting a negative float value to an unsigned integer
type is undefined behaviour in the C standard. On (at least) ARM
systems and IBM POWER systems [*1], the result is always zero in
that case, which causes this bug.

src/cmd/ksh93/sh/streval.c: arith_exec():
- Backport the 93v- 2013-06-28 fix: make sure that the number is
  not negative before typecasting it and compating it to zero.

*** BUG 2 ***

I introduced a similar bug with the same undefined behaviour myself
in 4592859 (see there for explanation). On ARM systems:

    $ typeset -ui i
    $ echo $((i = -5))
    0
    $ echo $i
    4294967291

The assignment worked fine (wrapped around as expected for an
unsigned integer), but the value of the assignment was 0 and should
have been the same as the variable's new value.

So lines 272, 274 and 276 in arith.c have undefined behaviour for a
negative n (which is of the Sfdouble_t float type):

271: else if((attr & NV_UINT64)==NV_UINT64) /* long unsigned integer */
272: 	r = (uintmax_t)n;
273: else if((attr & NV_UINT16)==NV_UINT16) /* short unsigned integer */
274: 	r = (uint16_t)n;
275: else if((attr & NV_UINT32)==NV_UINT32) /* normal unsigned integer */
276: 	r = (uint32_t)n;

src/cmd/ksh93/sh/arith.c: arith():
- Before doing those typecasts, check if n is negative. If so,
  invert the sign using the unary minus (yielding a positive
  float), do the typecast, and then do another unary minus
  operation on the resulting positive unsigned integer value --
  which is well-defined behaviour in C [*2] and yields a positive
  wrapped-around unsigned integer value, which is fine to
  implicitly typecast back to r's float type.

Resolves: #770

[*1] https://www.ibm.com/support/pages/assigning-negative-float-value-unsigned-integral-type-yields-undefined-value-stored-type
[*2] https://stackoverflow.com/questions/8026694/c-unary-minus-operator-behavior-with-unsigned-operands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This had better be fixed before releasing bug Something is not working
Projects
None yet
Development

No branches or pull requests

2 participants