I have two matricies, corresponding to data points `(x,y1)`

and `(x,y2)`

:

```
x | y1
------------
0 | 0
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
x | y2
----------------
0.5 | 0.5
1.5 | 1.5
2.5 | 2.5
3.5 | 3.5
4.5 | 4.5
5.5 | 5.5
```

I'd like to create a new matrix that combines the `x`

values into a single column, and has `NaN`

s in the appropriate `y1`

, `y2`

columns:

```
x | y1 | y2
-----------------------------
0 | 0 | NaN
0.5 | NaN | 0.5
1 | 0 | NaN
1.5 | NaN | 1.5
... | ... | ...
5 | 5 | NaN
5.5 | NaN | 5.5
```

Is there an easy way to do this? I'm new to Python and NumPy (coming from MATLAB) and I'm not sure how I would even begin with this. (For reference, my approach to this in MATLAB is simply using an `outerjoin`

against two tables that are generated with `array2table`

.)

cᴏʟᴅsᴘᴇᴇᴅ 11/18/2017.

If you can load your data into separate `pandas`

dataframes, this becomes simple.

```
df
x y1
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
df2
x y2
0 0.5 0.5
1 1.5 1.5
2 2.5 2.5
3 3.5 3.5
4 4.5 4.5
5 5.5 5.5
```

Perform an outer `merge`

, and sort on the `x`

column.

```
df = df.merge(df2, how='outer').sort_values('x')
df
x y1 y2
0 0 0 NaN
6 0.5 NaN 0.5
1 1 1 NaN
7 1.5 NaN 1.5
2 2 2 NaN
8 2.5 NaN 2.5
3 3 3 NaN
9 3.5 NaN 3.5
4 4 4 NaN
10 4.5 NaN 4.5
5 5 5 NaN
11 5.5 NaN 5.5
```

If you want an array, call `.values`

on the result:

```
df.values
array([[0.0, 0.0, nan],
[0.5, nan, 0.5],
[1.0, 1.0, nan],
[1.5, nan, 1.5],
[2.0, 2.0, nan],
[2.5, nan, 2.5],
[3.0, 3.0, nan],
[3.5, nan, 3.5],
[4.0, 4.0, nan],
[4.5, nan, 4.5],
[5.0, 5.0, nan],
[5.5, nan, 5.5]], dtype=object)
```

Eric Duminil 11/18/2017

Nice. Using pandas makes sense here. You basically need a mix of numpy arrays and Python dicts.

cᴏʟᴅsᴘᴇᴇᴅ 11/18/2017

@EricDuminil Thank you. It would seem the most painless option to me. However, I saw your answer which seemed pretty impressive (I couldn't have thought of a numpy solution as you did) and passed you an upvote :)

Eric Duminil 11/18/2017.

Here's an attempt with plain `numpy`

. It creates a matrix with 3 columns and as many rows as `a1 + a2`

. It writes `a1`

and `a2`

in the columns, and sort the rows by their first value.

Note that it only works if `x`

values are disjoint:

```
import numpy as np
x = np.arange(6)
# array([0, 1, 2, 3, 4, 5])
a1 = np.vstack((x,x)).T
# array([[0, 0],
# [1, 1],
# [2, 2],
# [3, 3],
# [4, 4],
# [5, 5]])
a2 = a1 + 0.5
# array([[ 0.5, 0.5],
# [ 1.5, 1.5],
# [ 2.5, 2.5],
# [ 3.5, 3.5],
# [ 4.5, 4.5],
# [ 5.5, 5.5]])
m = np.empty((12, 3))
m[:] = np.nan
# array([[ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan]])
m[:6, :2] = a1
# array([[ 0., 0., nan],
# [ 1., 1., nan],
# [ 2., 2., nan],
# [ 3., 3., nan],
# [ 4., 4., nan],
# [ 5., 5., nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan]])
m[6:, ::2] = a2
# array([[ 0. , 0. , nan],
# [ 1. , 1. , nan],
# [ 2. , 2. , nan],
# [ 3. , 3. , nan],
# [ 4. , 4. , nan],
# [ 5. , 5. , nan],
# [ 0.5, nan, 0.5],
# [ 1.5, nan, 1.5],
# [ 2.5, nan, 2.5],
# [ 3.5, nan, 3.5],
# [ 4.5, nan, 4.5],
# [ 5.5, nan, 5.5]])
m[m[:,0].argsort()]
# array([[ 0. , 0. , nan],
# [ 0.5, nan, 0.5],
# [ 1. , 1. , nan],
# [ 1.5, nan, 1.5],
# [ 2. , 2. , nan],
# [ 2.5, nan, 2.5],
# [ 3. , 3. , nan],
# [ 3.5, nan, 3.5],
# [ 4. , 4. , nan],
# [ 4.5, nan, 4.5],
# [ 5. , 5. , nan],
# [ 5.5, nan, 5.5]])
```

Using pandas is the correct method here.

hpaulj 11/18/2017.

A structured array approach (incomplete):

Input a special library of recfunctions:

`In [441]: import numpy.lib.recfunctions as rf`

Define two structured arrays

`In [442]: A = np.zeros((6,),[('x',int),('y',int)])`

Oops, the 'x`keys in`

B`are float, so for consistency, let's make the`

A` ones float as well. Don't mix floats and ints unnecessarily.

```
In [446]: A = np.zeros((6,),[('x',float),('y',int)])
In [447]: A['x']=np.arange(6)
In [448]: A['y']=np.arange(6)
In [449]: A
Out[449]:
array([( 0., 0), ( 1., 1), ( 2., 2), ( 3., 3), ( 4., 4), ( 5., 5)],
dtype=[('x', '<f8'), ('y', '<i4')])
In [450]: B = np.zeros((6,),[('x',float),('z',float)])
In [451]: B['x']=np.linspace(.5,5.5,6)
In [452]: B['z']=np.linspace(.5,5.5,6)
In [453]: B
Out[453]:
array([( 0.5, 0.5), ( 1.5, 1.5), ( 2.5, 2.5), ( 3.5, 3.5),
( 4.5, 4.5), ( 5.5, 5.5)],
dtype=[('x', '<f8'), ('z', '<f8')])
```

Look at the docs of the `rf.join_by`

function:

`In [454]: rf.join_by?`

Do an `outer`

join:

```
In [457]: rf.join_by('x',A,B,'outer')
Out[457]:
masked_array(data = [(0.0, 0, --) (0.5, --, 0.5) (1.0, 1, --) (1.5, --, 1.5) (2.0, 2, --)
(2.5, --, 2.5) (3.0, 3, --) (3.5, --, 3.5) (4.0, 4, --) (4.5, --, 4.5)
(5.0, 5, --) (5.5, --, 5.5)],
mask = [(False, False, True) (False, True, False) (False, False, True)
(False, True, False) (False, False, True) (False, True, False)
(False, False, True) (False, True, False) (False, False, True)
(False, True, False) (False, False, True) (False, True, False)],
fill_value = ( 1.00000000e+20, 999999, 1.00000000e+20),
dtype = [('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
```

The result is a masked array, with the missing values masked.

Same thing, but with masking turned off:

```
In [460]: rf.join_by('x',A,B,'outer',usemask=False)
Out[460]:
array([( 0. , 0, 1.00000000e+20), ( 0.5, 999999, 5.00000000e-01),
( 1. , 1, 1.00000000e+20), ( 1.5, 999999, 1.50000000e+00),
( 2. , 2, 1.00000000e+20), ( 2.5, 999999, 2.50000000e+00),
( 3. , 3, 1.00000000e+20), ( 3.5, 999999, 3.50000000e+00),
( 4. , 4, 1.00000000e+20), ( 4.5, 999999, 4.50000000e+00),
( 5. , 5, 1.00000000e+20), ( 5.5, 999999, 5.50000000e+00)],
dtype=[('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
```

Now we see the fill values explicitly. There must be a way of replacing the `1e20`

with `np.nan`

. Replacing `999999`

with `nan`

is messier, since `np.nan`

is a float value, not integer.

Under the cover this `join_by`

is probably first creating a `blank`

array with the `join`

`dtype`

, and filling in fields one by one.

kmcodes 11/18/2017.

Considering you may not need pandas for anything else, this is the standard lib solution.

I would break it down to 2 list of lists (assuming order of elements is important). So

```
xy1 = [[0,0],[1,1],......]
xy2 = [[0.5,0.5],[1.5,1.5],.......]
```

then merge these lists into a list x adding "NaN" alternately to either x[i][1] or x[i][2] position to compensate for the alternate roles where they are not present. Each x[i][0] is the key for a dictionary element with the values being a list with two elements listed above.

```
finalx = {item[0]: item[1:] for item in x}
finalx = {0:[0, 'NaN'],0.5:[NaN,0.5],......]
```

Hope this helps. this is more of a direction than a solution.

- How to merge two dictionaries in a single expression?
- Numpy array dimensions
- Concatenate two numpy arrays in the 4th dimension
- Why is reading lines from stdin much slower in C++ than Python?
- Subsampling/averaging over a numpy array
- How can i set the location of minor ticks in matplotlib
- Is there a multi-dimensional version of arange/linspace in numpy?
- Solving two sets of coupled ODEs via matrix form in Python
- Construct matrix using selection of rows and columns in numpy
- how to delete columns with same values in numpy
- How to use python structure array similar to matlab

- Can't create indexed view against table with masked columns
- Why can't a Tercio use crossbows and pikes?
- Mathematicians' Tensors vs. Physicists' Tensors
- Display summary results based on empty value in another column
- ListPlot with PlotMarkers -> None
- Which windows manager or desktop environment is in this image?
- Find a group G containing exactly 44 elements x s.t. x generates G
- How do towels stay on hooks?
- Unit Test to test the creation of a Domain Object
- How to find out how my father would like us to decide for him with regard to medical treatment?
- How bad is IPv4 address exhaustion really?
- Is this Homebrew "Grasscutter" Sword Overpowered?
- Upset by male classmates openly comparing female students according to physical appearance
- Did Vader actually die in the original Star Wars movie, now known as A New Hope?
- How to say that someone is humble (positivily)?
- Order to teach looping constructs in a non ideal language
- Catching exceptions thrown from scheduled job during test execution
- Could my creature's camouflage work?
- Conflicting Keys
- Should I pay for a computer up-front or on finance?
- Why does it say on Netflix that "Star Trek: Discovery" is a Netflix original series?
- Factor of a mersenne number
- Traits of Bad Writers - Analysing Popular Authors
- Is there any way to score an automatic natural 20?

`outerjoin`

.`numpy`

this is as awkward as using just`matrix`

in MATLAB. I can approximate it with structured arrays (and`recfunctions.join`

), which have some similarities to a MATLAB`struct`

(see, stackoverflow.com/questions/47277436/…).`pandas`

is better for`table`

like operations.