Merge multiple DataFrames Pandas

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd



df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],

       'VAR1': [38.196202, 38.198002, 38.200001],

       'profile': ['profile_1', 'profile_1','profile_1']})



df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],

       'VAR2': [0.20440, 0.20442, 0.20446],

       'profile': ['profile_1', 'profile_1','profile_1']})



df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],

       'VAR3': [15.1880, 15.1820, 15.1820],

       'profile': ['profile_1', 'profile_1','profile_1']})

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile    depth   VAR1        VAR2        VAR3

profile_1   0.500000    38.196202   NaN         NaN

profile_1   0.600000    38.198002   0.20440     NaN

profile_1   1.100000    NaN         0.20442     NaN

profile_1   1.200000    NaN         0.20446     15.1880

profile_1   1.300000    38.200001   NaN         15.1820

profile_1   1.400000    NaN         NaN         15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

add a comment |

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd



df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],

       'VAR1': [38.196202, 38.198002, 38.200001],

       'profile': ['profile_1', 'profile_1','profile_1']})



df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],

       'VAR2': [0.20440, 0.20442, 0.20446],

       'profile': ['profile_1', 'profile_1','profile_1']})



df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],

       'VAR3': [15.1880, 15.1820, 15.1820],

       'profile': ['profile_1', 'profile_1','profile_1']})

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile    depth   VAR1        VAR2        VAR3

profile_1   0.500000    38.196202   NaN         NaN

profile_1   0.600000    38.198002   0.20440     NaN

profile_1   1.100000    NaN         0.20442     NaN

profile_1   1.200000    NaN         0.20446     15.1880

profile_1   1.300000    38.200001   NaN         15.1820

profile_1   1.400000    NaN         NaN         15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

add a comment |

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd



df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],

       'VAR1': [38.196202, 38.198002, 38.200001],

       'profile': ['profile_1', 'profile_1','profile_1']})



df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],

       'VAR2': [0.20440, 0.20442, 0.20446],

       'profile': ['profile_1', 'profile_1','profile_1']})



df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],

       'VAR3': [15.1880, 15.1820, 15.1820],

       'profile': ['profile_1', 'profile_1','profile_1']})

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile    depth   VAR1        VAR2        VAR3

profile_1   0.500000    38.196202   NaN         NaN

profile_1   0.600000    38.198002   0.20440     NaN

profile_1   1.100000    NaN         0.20442     NaN

profile_1   1.200000    NaN         0.20446     15.1880

profile_1   1.300000    38.200001   NaN         15.1820

profile_1   1.400000    NaN         NaN         15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd



df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],

       'VAR1': [38.196202, 38.198002, 38.200001],

       'profile': ['profile_1', 'profile_1','profile_1']})



df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],

       'VAR2': [0.20440, 0.20442, 0.20446],

       'profile': ['profile_1', 'profile_1','profile_1']})



df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],

       'VAR3': [15.1880, 15.1820, 15.1820],

       'profile': ['profile_1', 'profile_1','profile_1']})

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile    depth   VAR1        VAR2        VAR3

profile_1   0.500000    38.196202   NaN         NaN

profile_1   0.600000    38.198002   0.20440     NaN

profile_1   1.100000    NaN         0.20442     NaN

profile_1   1.200000    NaN         0.20446     15.1880

profile_1   1.300000    38.200001   NaN         15.1820

profile_1   1.400000    NaN         NaN         15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

python pandas dataframe

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

edited 12 hours ago

asked 16 hours ago

PEBKAC

311110

asked 16 hours ago

PEBKAC

311110

asked 16 hours ago

PEBKAC

311110

add a comment |

5 Answers
5

active

oldest

votes

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]



print(pd.concat(dfs, axis=1).reset_index())

#      profile  depth       VAR1     VAR2    VAR3

# 0  profile_1    0.5  38.198002      NaN     NaN

# 1  profile_1    0.6  38.198002  0.20440     NaN

# 2  profile_1    1.1        NaN  0.20442     NaN

# 3  profile_1    1.2        NaN  0.20446  15.188

# 4  profile_1    1.3  38.200001      NaN  15.182

# 5  profile_1    1.4        NaN      NaN  15.182

answered 15 hours ago

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

|
show 8 more comments

Or using merge:

from functools import partial, reduce



dfs = [df1,df2,df3]

merge = partial(pd.merge, on=['depth','profile'], how='outer')

reduce(merge, dfs)



    depth       VAR1    profile     VAR2    VAR3

0    0.6  38.198002  profile_1  0.20440     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)

for df in dfs:

    df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

1

Check the update @PEBKAC

– yatu
15 hours ago

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

|
show 4 more comments

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')



        VAR1     VAR2    VAR3  depth    profile

0  38.196202      NaN     NaN    0.5  profile_1

1  38.198002      NaN     NaN    0.6  profile_1

0        NaN  0.20440     NaN    0.6  profile_1

1        NaN  0.20442     NaN    1.1  profile_1

2        NaN  0.20446     NaN    1.2  profile_1

0        NaN      NaN  15.188    1.2  profile_1

2  38.200001      NaN     NaN    1.3  profile_1

1        NaN      NaN  15.182    1.3  profile_1

2        NaN      NaN  15.182    1.4  profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])

df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable              VAR1     VAR2    VAR3

profile   depth                            

profile_1 0.5    38.196202      NaN     NaN

          0.6    38.198002  0.20440     NaN

          1.1          NaN  0.20442     NaN

          1.2          NaN  0.20446  15.188

          1.3    38.200001      NaN  15.182

          1.4          NaN      NaN  15.182

answered 15 hours ago

SEpapoulis

463

add a comment |

You can also use:

dfs = [df1, df2, df3]

df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

for d in dfs[2:]:

    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')



   depth       VAR1    profile     VAR2    VAR3

0    0.5  38.196202  profile_1      NaN     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

answered 15 hours ago

heena bawa

59645

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55652704%2fmerge-multiple-dataframes-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]



print(pd.concat(dfs, axis=1).reset_index())

#      profile  depth       VAR1     VAR2    VAR3

# 0  profile_1    0.5  38.198002      NaN     NaN

# 1  profile_1    0.6  38.198002  0.20440     NaN

# 2  profile_1    1.1        NaN  0.20442     NaN

# 3  profile_1    1.2        NaN  0.20446  15.188

# 4  profile_1    1.3  38.200001      NaN  15.182

# 5  profile_1    1.4        NaN      NaN  15.182

answered 15 hours ago

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

|
show 8 more comments

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]



print(pd.concat(dfs, axis=1).reset_index())

#      profile  depth       VAR1     VAR2    VAR3

# 0  profile_1    0.5  38.198002      NaN     NaN

# 1  profile_1    0.6  38.198002  0.20440     NaN

# 2  profile_1    1.1        NaN  0.20442     NaN

# 3  profile_1    1.2        NaN  0.20446  15.188

# 4  profile_1    1.3  38.200001      NaN  15.182

# 5  profile_1    1.4        NaN      NaN  15.182

answered 15 hours ago

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

|
show 8 more comments

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]



print(pd.concat(dfs, axis=1).reset_index())

#      profile  depth       VAR1     VAR2    VAR3

# 0  profile_1    0.5  38.198002      NaN     NaN

# 1  profile_1    0.6  38.198002  0.20440     NaN

# 2  profile_1    1.1        NaN  0.20442     NaN

# 3  profile_1    1.2        NaN  0.20446  15.188

# 4  profile_1    1.3  38.200001      NaN  15.182

# 5  profile_1    1.4        NaN      NaN  15.182

answered 15 hours ago

Parfait

54.3k104872

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]



print(pd.concat(dfs, axis=1).reset_index())

#      profile  depth       VAR1     VAR2    VAR3

# 0  profile_1    0.5  38.198002      NaN     NaN

# 1  profile_1    0.6  38.198002  0.20440     NaN

# 2  profile_1    1.1        NaN  0.20442     NaN

# 3  profile_1    1.2        NaN  0.20446  15.188

# 4  profile_1    1.3  38.200001      NaN  15.182

# 5  profile_1    1.4        NaN      NaN  15.182

answered 15 hours ago

Parfait

54.3k104872

answered 15 hours ago

Parfait

54.3k104872

answered 15 hours ago

Parfait

54.3k104872

answered 15 hours ago

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

|
show 8 more comments

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
15 hours ago

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
14 hours ago

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
14 hours ago

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
12 hours ago

|
show 8 more comments

Or using merge:

from functools import partial, reduce



dfs = [df1,df2,df3]

merge = partial(pd.merge, on=['depth','profile'], how='outer')

reduce(merge, dfs)



    depth       VAR1    profile     VAR2    VAR3

0    0.6  38.198002  profile_1  0.20440     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)

for df in dfs:

    df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

1

Check the update @PEBKAC

– yatu
15 hours ago

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

|
show 4 more comments

Or using merge:

from functools import partial, reduce



dfs = [df1,df2,df3]

merge = partial(pd.merge, on=['depth','profile'], how='outer')

reduce(merge, dfs)



    depth       VAR1    profile     VAR2    VAR3

0    0.6  38.198002  profile_1  0.20440     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)

for df in dfs:

    df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

1

Check the update @PEBKAC

– yatu
15 hours ago

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

|
show 4 more comments

Or using merge:

from functools import partial, reduce



dfs = [df1,df2,df3]

merge = partial(pd.merge, on=['depth','profile'], how='outer')

reduce(merge, dfs)



    depth       VAR1    profile     VAR2    VAR3

0    0.6  38.198002  profile_1  0.20440     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)

for df in dfs:

    df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

Or using merge:

from functools import partial, reduce



dfs = [df1,df2,df3]

merge = partial(pd.merge, on=['depth','profile'], how='outer')

reduce(merge, dfs)



    depth       VAR1    profile     VAR2    VAR3

0    0.6  38.198002  profile_1  0.20440     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)

for df in dfs:

    df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

edited 15 hours ago

answered 15 hours ago

yatu

15.8k41642

answered 15 hours ago

yatu

15.8k41642

answered 15 hours ago

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

1

Check the update @PEBKAC

– yatu
15 hours ago

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

|
show 4 more comments

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
15 hours ago

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

1

Check the update @PEBKAC

– yatu
15 hours ago

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
15 hours ago

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
15 hours ago

Check the update @PEBKAC

– yatu
15 hours ago

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
14 hours ago

|
show 4 more comments

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')



        VAR1     VAR2    VAR3  depth    profile

0  38.196202      NaN     NaN    0.5  profile_1

1  38.198002      NaN     NaN    0.6  profile_1

0        NaN  0.20440     NaN    0.6  profile_1

1        NaN  0.20442     NaN    1.1  profile_1

2        NaN  0.20446     NaN    1.2  profile_1

0        NaN      NaN  15.188    1.2  profile_1

2  38.200001      NaN     NaN    1.3  profile_1

1        NaN      NaN  15.182    1.3  profile_1

2        NaN      NaN  15.182    1.4  profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

add a comment |

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')



        VAR1     VAR2    VAR3  depth    profile

0  38.196202      NaN     NaN    0.5  profile_1

1  38.198002      NaN     NaN    0.6  profile_1

0        NaN  0.20440     NaN    0.6  profile_1

1        NaN  0.20442     NaN    1.1  profile_1

2        NaN  0.20446     NaN    1.2  profile_1

0        NaN      NaN  15.188    1.2  profile_1

2  38.200001      NaN     NaN    1.3  profile_1

1        NaN      NaN  15.182    1.3  profile_1

2        NaN      NaN  15.182    1.4  profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

add a comment |

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')



        VAR1     VAR2    VAR3  depth    profile

0  38.196202      NaN     NaN    0.5  profile_1

1  38.198002      NaN     NaN    0.6  profile_1

0        NaN  0.20440     NaN    0.6  profile_1

1        NaN  0.20442     NaN    1.1  profile_1

2        NaN  0.20446     NaN    1.2  profile_1

0        NaN      NaN  15.188    1.2  profile_1

2  38.200001      NaN     NaN    1.3  profile_1

1        NaN      NaN  15.182    1.3  profile_1

2        NaN      NaN  15.182    1.4  profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')



        VAR1     VAR2    VAR3  depth    profile

0  38.196202      NaN     NaN    0.5  profile_1

1  38.198002      NaN     NaN    0.6  profile_1

0        NaN  0.20440     NaN    0.6  profile_1

1        NaN  0.20442     NaN    1.1  profile_1

2        NaN  0.20446     NaN    1.2  profile_1

0        NaN      NaN  15.188    1.2  profile_1

2  38.200001      NaN     NaN    1.3  profile_1

1        NaN      NaN  15.182    1.3  profile_1

2        NaN      NaN  15.182    1.4  profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

edited 15 hours ago

answered 15 hours ago

BlivetWidget

3,7991922

answered 15 hours ago

BlivetWidget

3,7991922

answered 15 hours ago

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

add a comment |

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
10 hours ago

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
10 hours ago

thank you, most helpful!

– PEBKAC
10 hours ago

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])

df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable              VAR1     VAR2    VAR3

profile   depth                            

profile_1 0.5    38.196202      NaN     NaN

          0.6    38.198002  0.20440     NaN

          1.1          NaN  0.20442     NaN

          1.2          NaN  0.20446  15.188

          1.3    38.200001      NaN  15.182

          1.4          NaN      NaN  15.182

answered 15 hours ago

SEpapoulis

463

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])

df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable              VAR1     VAR2    VAR3

profile   depth                            

profile_1 0.5    38.196202      NaN     NaN

          0.6    38.198002  0.20440     NaN

          1.1          NaN  0.20442     NaN

          1.2          NaN  0.20446  15.188

          1.3    38.200001      NaN  15.182

          1.4          NaN      NaN  15.182

answered 15 hours ago

SEpapoulis

463

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])

df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable              VAR1     VAR2    VAR3

profile   depth                            

profile_1 0.5    38.196202      NaN     NaN

          0.6    38.198002  0.20440     NaN

          1.1          NaN  0.20442     NaN

          1.2          NaN  0.20446  15.188

          1.3    38.200001      NaN  15.182

          1.4          NaN      NaN  15.182

answered 15 hours ago

SEpapoulis

463

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])

df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable              VAR1     VAR2    VAR3

profile   depth                            

profile_1 0.5    38.196202      NaN     NaN

          0.6    38.198002  0.20440     NaN

          1.1          NaN  0.20442     NaN

          1.2          NaN  0.20446  15.188

          1.3    38.200001      NaN  15.182

          1.4          NaN      NaN  15.182

answered 15 hours ago

SEpapoulis

463

answered 15 hours ago

SEpapoulis

463

answered 15 hours ago

SEpapoulis

463

answered 15 hours ago

SEpapoulis

463

add a comment |

You can also use:

dfs = [df1, df2, df3]

df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

for d in dfs[2:]:

    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')



   depth       VAR1    profile     VAR2    VAR3

0    0.5  38.196202  profile_1      NaN     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

answered 15 hours ago

heena bawa

59645

add a comment |

You can also use:

dfs = [df1, df2, df3]

df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

for d in dfs[2:]:

    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')



   depth       VAR1    profile     VAR2    VAR3

0    0.5  38.196202  profile_1      NaN     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

answered 15 hours ago

heena bawa

59645

add a comment |

You can also use:

dfs = [df1, df2, df3]

df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

for d in dfs[2:]:

    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')



   depth       VAR1    profile     VAR2    VAR3

0    0.5  38.196202  profile_1      NaN     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

answered 15 hours ago

heena bawa

59645

You can also use:

dfs = [df1, df2, df3]

df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

for d in dfs[2:]:

    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')



   depth       VAR1    profile     VAR2    VAR3

0    0.5  38.196202  profile_1      NaN     NaN

1    0.6  38.198002  profile_1  0.20440     NaN

2    1.3  38.200001  profile_1      NaN  15.182

3    1.1        NaN  profile_1  0.20442     NaN

4    1.2        NaN  profile_1  0.20446  15.188

5    1.4        NaN  profile_1      NaN  15.182

answered 15 hours ago

heena bawa

59645

answered 15 hours ago

heena bawa

59645

answered 15 hours ago

heena bawa

59645

answered 15 hours ago

heena bawa

59645

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bxdty