So, a friend told me about one of his interview questions: reverse an array in-place using C. Also, use the absolute minimum amount of memory.

Reversing the array is the easy part. Attack it from both of its sides, and just swap the elements, and move one step towards the center. But swapping is the tricky part.

Initially, I wrote this:

void rev(void) {
    int a[] = {1,2,3};

    int tmp;
    tmp = a[0];
    a[0] = a[2];
    a[2] = tmp;
}

(This is just a dummy snippet to show the logic behind the swapping part)

Is this minimal memory? According to interviewers, no. So my buddy told me that I had to ditch the tmp variable. Time to try something else. Lets try to use arithmetic.

void rev(void) {
    int a[] = {1,2,3};

    a[0] = a[0] + a[2]; // {4, 2, 3}
    a[2] = a[0] - a[2]; // {4 ,2, 1}
    a[0] = a[0] - a[2]; // {3, 2, 1}
}

This also "isn't a valid solution", according to my friend, because an overflow can occur. Hmm, okay, fair enough. The final solution is given using xor:

int main(void) {
    int a[] = {1,2,3};


    a[0] = a[0] ^ a[2];
    a[2] = a[0] ^ a[2];
    a[0] = a[0] ^ a[2];
}

This is a cool trick, I have to admit. But it got me scratching my head, does this use less memory?

Let's compile this to x86_64 using clang 9.0.0 (gcc also gives the same results in these examples).

First, lets try using a tmp variable:

tmp = a[0];
a[0] = a[2];
a[2] = tmp;

which gives:

mov     edx, dword ptr [rbp - 20]
mov     dword ptr [rbp - 24], edx
mov     edx, dword ptr [rbp - 12]
mov     dword ptr [rbp - 20], edx
mov     edx, dword ptr [rbp - 24]
mov     dword ptr [rbp - 12], edx

Okay. We do require an extra register to hold the tmp value (here it is edx), and we could nitpick that an extra register (especially a volatile one such as edx) doesn't really count as "using more memory".

But lets look at the XOR approach:

a[0] = a[0] ^ a[2];
a[2] = a[0] ^ a[2];
a[0] = a[0] ^ a[2];

which gives:

mov     edx, dword ptr [rbp - 16]
xor     edx, dword ptr [rbp - 8]
mov     dword ptr [rbp - 16], edx
mov     edx, dword ptr [rbp - 16]
xor     edx, dword ptr [rbp - 8]
mov     dword ptr [rbp - 8], edx
mov     edx, dword ptr [rbp - 16]
xor     edx, dword ptr [rbp - 8]
mov     dword ptr [rbp - 16], edx

First of all, we see that this approach uses 3 more instructions compared to the tmp approach. But okay, the task requires minimum memory, not CPU cycles.

But if we dive in deeper, this approach also uses the edx register! Even if we claimed that using a register is the same as using memory, XOR approach isn't more memory efficient than the tmp approach. In order to XOR an array of any given length, the value has to go to a register first. After getting XOR-ed it leaves the register and heads back to memory. All in all, from what I can tell, both solutions are equivalent memory wise, with the XOR solution yielding worse CPU performance (and being less readable).

edit:
Someone complained that there were no benchmarks and that a slightly larger instruction count doesn't prove anything. Somewhat of a fair point. On my Core i5-3340, 10 million iterations of the XOR approach takes 2.876 s, while the tmp approach takes 2.605 s, with minimal jitter of less than a ms.