Merge multiple DataFrames Pandas





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







7















This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.



I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:



import pandas as pd

df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
'VAR1': [38.196202, 38.198002, 38.200001],
'profile': ['profile_1', 'profile_1','profile_1']})

df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
'VAR2': [0.20440, 0.20442, 0.20446],
'profile': ['profile_1', 'profile_1','profile_1']})

df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
'VAR3': [15.1880, 15.1820, 15.1820],
'profile': ['profile_1', 'profile_1','profile_1']})


Each df has same or different depths for the same profiles, so



I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.



The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.



The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:



name_profile    depth   VAR1        VAR2        VAR3
profile_1 0.500000 38.196202 NaN NaN
profile_1 0.600000 38.198002 0.20440 NaN
profile_1 1.100000 NaN 0.20442 NaN
profile_1 1.200000 NaN 0.20446 15.1880
profile_1 1.300000 38.200001 NaN 15.1820
profile_1 1.400000 NaN NaN 15.1820


Note that the actual number of profiles is much, much bigger.



Any ideas?










share|improve this question































    7















    This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.



    I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:



    import pandas as pd

    df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
    'VAR1': [38.196202, 38.198002, 38.200001],
    'profile': ['profile_1', 'profile_1','profile_1']})

    df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
    'VAR2': [0.20440, 0.20442, 0.20446],
    'profile': ['profile_1', 'profile_1','profile_1']})

    df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
    'VAR3': [15.1880, 15.1820, 15.1820],
    'profile': ['profile_1', 'profile_1','profile_1']})


    Each df has same or different depths for the same profiles, so



    I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.



    The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.



    The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:



    name_profile    depth   VAR1        VAR2        VAR3
    profile_1 0.500000 38.196202 NaN NaN
    profile_1 0.600000 38.198002 0.20440 NaN
    profile_1 1.100000 NaN 0.20442 NaN
    profile_1 1.200000 NaN 0.20446 15.1880
    profile_1 1.300000 38.200001 NaN 15.1820
    profile_1 1.400000 NaN NaN 15.1820


    Note that the actual number of profiles is much, much bigger.



    Any ideas?










    share|improve this question



























      7












      7








      7


      1






      This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.



      I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:



      import pandas as pd

      df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
      'VAR1': [38.196202, 38.198002, 38.200001],
      'profile': ['profile_1', 'profile_1','profile_1']})

      df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
      'VAR2': [0.20440, 0.20442, 0.20446],
      'profile': ['profile_1', 'profile_1','profile_1']})

      df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
      'VAR3': [15.1880, 15.1820, 15.1820],
      'profile': ['profile_1', 'profile_1','profile_1']})


      Each df has same or different depths for the same profiles, so



      I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.



      The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.



      The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:



      name_profile    depth   VAR1        VAR2        VAR3
      profile_1 0.500000 38.196202 NaN NaN
      profile_1 0.600000 38.198002 0.20440 NaN
      profile_1 1.100000 NaN 0.20442 NaN
      profile_1 1.200000 NaN 0.20446 15.1880
      profile_1 1.300000 38.200001 NaN 15.1820
      profile_1 1.400000 NaN NaN 15.1820


      Note that the actual number of profiles is much, much bigger.



      Any ideas?










      share|improve this question
















      This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.



      I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:



      import pandas as pd

      df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
      'VAR1': [38.196202, 38.198002, 38.200001],
      'profile': ['profile_1', 'profile_1','profile_1']})

      df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
      'VAR2': [0.20440, 0.20442, 0.20446],
      'profile': ['profile_1', 'profile_1','profile_1']})

      df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
      'VAR3': [15.1880, 15.1820, 15.1820],
      'profile': ['profile_1', 'profile_1','profile_1']})


      Each df has same or different depths for the same profiles, so



      I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.



      The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.



      The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:



      name_profile    depth   VAR1        VAR2        VAR3
      profile_1 0.500000 38.196202 NaN NaN
      profile_1 0.600000 38.198002 0.20440 NaN
      profile_1 1.100000 NaN 0.20442 NaN
      profile_1 1.200000 NaN 0.20446 15.1880
      profile_1 1.300000 38.200001 NaN 15.1820
      profile_1 1.400000 NaN NaN 15.1820


      Note that the actual number of profiles is much, much bigger.



      Any ideas?







      python pandas dataframe






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 12 hours ago







      PEBKAC

















      asked 16 hours ago









      PEBKACPEBKAC

      311110




      311110
























          5 Answers
          5






          active

          oldest

          votes


















          5














          Consider setting index on each data frame and then run the horizontal merge with pd.concat:



          dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

          print(pd.concat(dfs, axis=1).reset_index())
          # profile depth VAR1 VAR2 VAR3
          # 0 profile_1 0.5 38.198002 NaN NaN
          # 1 profile_1 0.6 38.198002 0.20440 NaN
          # 2 profile_1 1.1 NaN 0.20442 NaN
          # 3 profile_1 1.2 NaN 0.20446 15.188
          # 4 profile_1 1.3 38.200001 NaN 15.182
          # 5 profile_1 1.4 NaN NaN 15.182





          share|improve this answer
























          • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

            – PEBKAC
            15 hours ago








          • 1





            Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

            – Parfait
            15 hours ago






          • 1





            You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

            – Parfait
            14 hours ago






          • 1





            I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

            – Parfait
            14 hours ago






          • 1





            You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

            – Parfait
            12 hours ago



















          3














          Or using merge:



          from functools import partial, reduce

          dfs = [df1,df2,df3]
          merge = partial(pd.merge, on=['depth','profile'], how='outer')
          reduce(merge, dfs)

          depth VAR1 profile VAR2 VAR3
          0 0.6 38.198002 profile_1 0.20440 NaN
          1 0.6 38.198002 profile_1 0.20440 NaN
          2 1.3 38.200001 profile_1 NaN 15.182
          3 1.1 NaN profile_1 0.20442 NaN
          4 1.2 NaN profile_1 0.20446 15.188
          5 1.4 NaN profile_1 NaN 15.182


          Update



          For merging the dataframes in a loop as suggested in the comments, you could do something like:



          df_final = pd.DataFrame(columns=df1.columns)
          for df in dfs:
          df_final = df_final.merge(df, on=['depth','profile'], how='outer')





          share|improve this answer


























          • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

            – PEBKAC
            15 hours ago








          • 1





            Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

            – yatu
            15 hours ago













          • thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

            – PEBKAC
            15 hours ago






          • 1





            Check the update @PEBKAC

            – yatu
            15 hours ago






          • 1





            Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

            – yatu
            14 hours ago



















          1














          I would use append.



          >>> df1.append(df2).append(df3).sort_values('depth')

          VAR1 VAR2 VAR3 depth profile
          0 38.196202 NaN NaN 0.5 profile_1
          1 38.198002 NaN NaN 0.6 profile_1
          0 NaN 0.20440 NaN 0.6 profile_1
          1 NaN 0.20442 NaN 1.1 profile_1
          2 NaN 0.20446 NaN 1.2 profile_1
          0 NaN NaN 15.188 1.2 profile_1
          2 38.200001 NaN NaN 1.3 profile_1
          1 NaN NaN 15.182 1.3 profile_1
          2 NaN NaN 15.182 1.4 profile_1


          Obviously if you have a lot of dataframes, just make a list and loop through them.






          share|improve this answer


























          • thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

            – PEBKAC
            10 hours ago






          • 1





            @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

            – BlivetWidget
            10 hours ago











          • thank you, most helpful!

            – PEBKAC
            10 hours ago



















          1














          Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.



          df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
          df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')


          Where df_pivot will be



          variable              VAR1     VAR2    VAR3
          profile depth
          profile_1 0.5 38.196202 NaN NaN
          0.6 38.198002 0.20440 NaN
          1.1 NaN 0.20442 NaN
          1.2 NaN 0.20446 15.188
          1.3 38.200001 NaN 15.182
          1.4 NaN NaN 15.182





          share|improve this answer































            1














            You can also use:



            dfs = [df1, df2, df3]
            df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
            for d in dfs[2:]:
            df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

            depth VAR1 profile VAR2 VAR3
            0 0.5 38.196202 profile_1 NaN NaN
            1 0.6 38.198002 profile_1 0.20440 NaN
            2 1.3 38.200001 profile_1 NaN 15.182
            3 1.1 NaN profile_1 0.20442 NaN
            4 1.2 NaN profile_1 0.20446 15.188
            5 1.4 NaN profile_1 NaN 15.182





            share|improve this answer
























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55652704%2fmerge-multiple-dataframes-pandas%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              5 Answers
              5






              active

              oldest

              votes








              5 Answers
              5






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5














              Consider setting index on each data frame and then run the horizontal merge with pd.concat:



              dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

              print(pd.concat(dfs, axis=1).reset_index())
              # profile depth VAR1 VAR2 VAR3
              # 0 profile_1 0.5 38.198002 NaN NaN
              # 1 profile_1 0.6 38.198002 0.20440 NaN
              # 2 profile_1 1.1 NaN 0.20442 NaN
              # 3 profile_1 1.2 NaN 0.20446 15.188
              # 4 profile_1 1.3 38.200001 NaN 15.182
              # 5 profile_1 1.4 NaN NaN 15.182





              share|improve this answer
























              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

                – Parfait
                15 hours ago






              • 1





                You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

                – Parfait
                14 hours ago






              • 1





                I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

                – Parfait
                14 hours ago






              • 1





                You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

                – Parfait
                12 hours ago
















              5














              Consider setting index on each data frame and then run the horizontal merge with pd.concat:



              dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

              print(pd.concat(dfs, axis=1).reset_index())
              # profile depth VAR1 VAR2 VAR3
              # 0 profile_1 0.5 38.198002 NaN NaN
              # 1 profile_1 0.6 38.198002 0.20440 NaN
              # 2 profile_1 1.1 NaN 0.20442 NaN
              # 3 profile_1 1.2 NaN 0.20446 15.188
              # 4 profile_1 1.3 38.200001 NaN 15.182
              # 5 profile_1 1.4 NaN NaN 15.182





              share|improve this answer
























              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

                – Parfait
                15 hours ago






              • 1





                You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

                – Parfait
                14 hours ago






              • 1





                I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

                – Parfait
                14 hours ago






              • 1





                You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

                – Parfait
                12 hours ago














              5












              5








              5







              Consider setting index on each data frame and then run the horizontal merge with pd.concat:



              dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

              print(pd.concat(dfs, axis=1).reset_index())
              # profile depth VAR1 VAR2 VAR3
              # 0 profile_1 0.5 38.198002 NaN NaN
              # 1 profile_1 0.6 38.198002 0.20440 NaN
              # 2 profile_1 1.1 NaN 0.20442 NaN
              # 3 profile_1 1.2 NaN 0.20446 15.188
              # 4 profile_1 1.3 38.200001 NaN 15.182
              # 5 profile_1 1.4 NaN NaN 15.182





              share|improve this answer













              Consider setting index on each data frame and then run the horizontal merge with pd.concat:



              dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

              print(pd.concat(dfs, axis=1).reset_index())
              # profile depth VAR1 VAR2 VAR3
              # 0 profile_1 0.5 38.198002 NaN NaN
              # 1 profile_1 0.6 38.198002 0.20440 NaN
              # 2 profile_1 1.1 NaN 0.20442 NaN
              # 3 profile_1 1.2 NaN 0.20446 15.188
              # 4 profile_1 1.3 38.200001 NaN 15.182
              # 5 profile_1 1.4 NaN NaN 15.182






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 15 hours ago









              ParfaitParfait

              54.3k104872




              54.3k104872













              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

                – Parfait
                15 hours ago






              • 1





                You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

                – Parfait
                14 hours ago






              • 1





                I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

                – Parfait
                14 hours ago






              • 1





                You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

                – Parfait
                12 hours ago



















              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

                – Parfait
                15 hours ago






              • 1





                You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

                – Parfait
                14 hours ago






              • 1





                I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

                – Parfait
                14 hours ago






              • 1





                You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

                – Parfait
                12 hours ago

















              that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

              – PEBKAC
              15 hours ago







              that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

              – PEBKAC
              15 hours ago






              1




              1





              Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

              – Parfait
              15 hours ago





              Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

              – Parfait
              15 hours ago




              1




              1





              You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

              – Parfait
              14 hours ago





              You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

              – Parfait
              14 hours ago




              1




              1





              I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

              – Parfait
              14 hours ago





              I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

              – Parfait
              14 hours ago




              1




              1





              You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

              – Parfait
              12 hours ago





              You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

              – Parfait
              12 hours ago













              3














              Or using merge:



              from functools import partial, reduce

              dfs = [df1,df2,df3]
              merge = partial(pd.merge, on=['depth','profile'], how='outer')
              reduce(merge, dfs)

              depth VAR1 profile VAR2 VAR3
              0 0.6 38.198002 profile_1 0.20440 NaN
              1 0.6 38.198002 profile_1 0.20440 NaN
              2 1.3 38.200001 profile_1 NaN 15.182
              3 1.1 NaN profile_1 0.20442 NaN
              4 1.2 NaN profile_1 0.20446 15.188
              5 1.4 NaN profile_1 NaN 15.182


              Update



              For merging the dataframes in a loop as suggested in the comments, you could do something like:



              df_final = pd.DataFrame(columns=df1.columns)
              for df in dfs:
              df_final = df_final.merge(df, on=['depth','profile'], how='outer')





              share|improve this answer


























              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

                – yatu
                15 hours ago













              • thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

                – PEBKAC
                15 hours ago






              • 1





                Check the update @PEBKAC

                – yatu
                15 hours ago






              • 1





                Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

                – yatu
                14 hours ago
















              3














              Or using merge:



              from functools import partial, reduce

              dfs = [df1,df2,df3]
              merge = partial(pd.merge, on=['depth','profile'], how='outer')
              reduce(merge, dfs)

              depth VAR1 profile VAR2 VAR3
              0 0.6 38.198002 profile_1 0.20440 NaN
              1 0.6 38.198002 profile_1 0.20440 NaN
              2 1.3 38.200001 profile_1 NaN 15.182
              3 1.1 NaN profile_1 0.20442 NaN
              4 1.2 NaN profile_1 0.20446 15.188
              5 1.4 NaN profile_1 NaN 15.182


              Update



              For merging the dataframes in a loop as suggested in the comments, you could do something like:



              df_final = pd.DataFrame(columns=df1.columns)
              for df in dfs:
              df_final = df_final.merge(df, on=['depth','profile'], how='outer')





              share|improve this answer


























              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

                – yatu
                15 hours ago













              • thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

                – PEBKAC
                15 hours ago






              • 1





                Check the update @PEBKAC

                – yatu
                15 hours ago






              • 1





                Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

                – yatu
                14 hours ago














              3












              3








              3







              Or using merge:



              from functools import partial, reduce

              dfs = [df1,df2,df3]
              merge = partial(pd.merge, on=['depth','profile'], how='outer')
              reduce(merge, dfs)

              depth VAR1 profile VAR2 VAR3
              0 0.6 38.198002 profile_1 0.20440 NaN
              1 0.6 38.198002 profile_1 0.20440 NaN
              2 1.3 38.200001 profile_1 NaN 15.182
              3 1.1 NaN profile_1 0.20442 NaN
              4 1.2 NaN profile_1 0.20446 15.188
              5 1.4 NaN profile_1 NaN 15.182


              Update



              For merging the dataframes in a loop as suggested in the comments, you could do something like:



              df_final = pd.DataFrame(columns=df1.columns)
              for df in dfs:
              df_final = df_final.merge(df, on=['depth','profile'], how='outer')





              share|improve this answer















              Or using merge:



              from functools import partial, reduce

              dfs = [df1,df2,df3]
              merge = partial(pd.merge, on=['depth','profile'], how='outer')
              reduce(merge, dfs)

              depth VAR1 profile VAR2 VAR3
              0 0.6 38.198002 profile_1 0.20440 NaN
              1 0.6 38.198002 profile_1 0.20440 NaN
              2 1.3 38.200001 profile_1 NaN 15.182
              3 1.1 NaN profile_1 0.20442 NaN
              4 1.2 NaN profile_1 0.20446 15.188
              5 1.4 NaN profile_1 NaN 15.182


              Update



              For merging the dataframes in a loop as suggested in the comments, you could do something like:



              df_final = pd.DataFrame(columns=df1.columns)
              for df in dfs:
              df_final = df_final.merge(df, on=['depth','profile'], how='outer')






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 15 hours ago

























              answered 15 hours ago









              yatuyatu

              15.8k41642




              15.8k41642













              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

                – yatu
                15 hours ago













              • thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

                – PEBKAC
                15 hours ago






              • 1





                Check the update @PEBKAC

                – yatu
                15 hours ago






              • 1





                Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

                – yatu
                14 hours ago



















              • that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

                – PEBKAC
                15 hours ago








              • 1





                Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

                – yatu
                15 hours ago













              • thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

                – PEBKAC
                15 hours ago






              • 1





                Check the update @PEBKAC

                – yatu
                15 hours ago






              • 1





                Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

                – yatu
                14 hours ago

















              that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

              – PEBKAC
              15 hours ago







              that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

              – PEBKAC
              15 hours ago






              1




              1





              Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

              – yatu
              15 hours ago







              Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

              – yatu
              15 hours ago















              thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

              – PEBKAC
              15 hours ago





              thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

              – PEBKAC
              15 hours ago




              1




              1





              Check the update @PEBKAC

              – yatu
              15 hours ago





              Check the update @PEBKAC

              – yatu
              15 hours ago




              1




              1





              Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

              – yatu
              14 hours ago





              Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

              – yatu
              14 hours ago











              1














              I would use append.



              >>> df1.append(df2).append(df3).sort_values('depth')

              VAR1 VAR2 VAR3 depth profile
              0 38.196202 NaN NaN 0.5 profile_1
              1 38.198002 NaN NaN 0.6 profile_1
              0 NaN 0.20440 NaN 0.6 profile_1
              1 NaN 0.20442 NaN 1.1 profile_1
              2 NaN 0.20446 NaN 1.2 profile_1
              0 NaN NaN 15.188 1.2 profile_1
              2 38.200001 NaN NaN 1.3 profile_1
              1 NaN NaN 15.182 1.3 profile_1
              2 NaN NaN 15.182 1.4 profile_1


              Obviously if you have a lot of dataframes, just make a list and loop through them.






              share|improve this answer


























              • thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

                – PEBKAC
                10 hours ago






              • 1





                @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

                – BlivetWidget
                10 hours ago











              • thank you, most helpful!

                – PEBKAC
                10 hours ago
















              1














              I would use append.



              >>> df1.append(df2).append(df3).sort_values('depth')

              VAR1 VAR2 VAR3 depth profile
              0 38.196202 NaN NaN 0.5 profile_1
              1 38.198002 NaN NaN 0.6 profile_1
              0 NaN 0.20440 NaN 0.6 profile_1
              1 NaN 0.20442 NaN 1.1 profile_1
              2 NaN 0.20446 NaN 1.2 profile_1
              0 NaN NaN 15.188 1.2 profile_1
              2 38.200001 NaN NaN 1.3 profile_1
              1 NaN NaN 15.182 1.3 profile_1
              2 NaN NaN 15.182 1.4 profile_1


              Obviously if you have a lot of dataframes, just make a list and loop through them.






              share|improve this answer


























              • thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

                – PEBKAC
                10 hours ago






              • 1





                @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

                – BlivetWidget
                10 hours ago











              • thank you, most helpful!

                – PEBKAC
                10 hours ago














              1












              1








              1







              I would use append.



              >>> df1.append(df2).append(df3).sort_values('depth')

              VAR1 VAR2 VAR3 depth profile
              0 38.196202 NaN NaN 0.5 profile_1
              1 38.198002 NaN NaN 0.6 profile_1
              0 NaN 0.20440 NaN 0.6 profile_1
              1 NaN 0.20442 NaN 1.1 profile_1
              2 NaN 0.20446 NaN 1.2 profile_1
              0 NaN NaN 15.188 1.2 profile_1
              2 38.200001 NaN NaN 1.3 profile_1
              1 NaN NaN 15.182 1.3 profile_1
              2 NaN NaN 15.182 1.4 profile_1


              Obviously if you have a lot of dataframes, just make a list and loop through them.






              share|improve this answer















              I would use append.



              >>> df1.append(df2).append(df3).sort_values('depth')

              VAR1 VAR2 VAR3 depth profile
              0 38.196202 NaN NaN 0.5 profile_1
              1 38.198002 NaN NaN 0.6 profile_1
              0 NaN 0.20440 NaN 0.6 profile_1
              1 NaN 0.20442 NaN 1.1 profile_1
              2 NaN 0.20446 NaN 1.2 profile_1
              0 NaN NaN 15.188 1.2 profile_1
              2 38.200001 NaN NaN 1.3 profile_1
              1 NaN NaN 15.182 1.3 profile_1
              2 NaN NaN 15.182 1.4 profile_1


              Obviously if you have a lot of dataframes, just make a list and loop through them.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 15 hours ago

























              answered 15 hours ago









              BlivetWidgetBlivetWidget

              3,7991922




              3,7991922













              • thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

                – PEBKAC
                10 hours ago






              • 1





                @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

                – BlivetWidget
                10 hours ago











              • thank you, most helpful!

                – PEBKAC
                10 hours ago



















              • thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

                – PEBKAC
                10 hours ago






              • 1





                @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

                – BlivetWidget
                10 hours ago











              • thank you, most helpful!

                – PEBKAC
                10 hours ago

















              thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

              – PEBKAC
              10 hours ago





              thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

              – PEBKAC
              10 hours ago




              1




              1





              @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

              – BlivetWidget
              10 hours ago





              @PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

              – BlivetWidget
              10 hours ago













              thank you, most helpful!

              – PEBKAC
              10 hours ago





              thank you, most helpful!

              – PEBKAC
              10 hours ago











              1














              Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.



              df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
              df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')


              Where df_pivot will be



              variable              VAR1     VAR2    VAR3
              profile depth
              profile_1 0.5 38.196202 NaN NaN
              0.6 38.198002 0.20440 NaN
              1.1 NaN 0.20442 NaN
              1.2 NaN 0.20446 15.188
              1.3 38.200001 NaN 15.182
              1.4 NaN NaN 15.182





              share|improve this answer




























                1














                Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.



                df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
                df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')


                Where df_pivot will be



                variable              VAR1     VAR2    VAR3
                profile depth
                profile_1 0.5 38.196202 NaN NaN
                0.6 38.198002 0.20440 NaN
                1.1 NaN 0.20442 NaN
                1.2 NaN 0.20446 15.188
                1.3 38.200001 NaN 15.182
                1.4 NaN NaN 15.182





                share|improve this answer


























                  1












                  1








                  1







                  Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.



                  df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
                  df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')


                  Where df_pivot will be



                  variable              VAR1     VAR2    VAR3
                  profile depth
                  profile_1 0.5 38.196202 NaN NaN
                  0.6 38.198002 0.20440 NaN
                  1.1 NaN 0.20442 NaN
                  1.2 NaN 0.20446 15.188
                  1.3 38.200001 NaN 15.182
                  1.4 NaN NaN 15.182





                  share|improve this answer













                  Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.



                  df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
                  df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')


                  Where df_pivot will be



                  variable              VAR1     VAR2    VAR3
                  profile depth
                  profile_1 0.5 38.196202 NaN NaN
                  0.6 38.198002 0.20440 NaN
                  1.1 NaN 0.20442 NaN
                  1.2 NaN 0.20446 15.188
                  1.3 38.200001 NaN 15.182
                  1.4 NaN NaN 15.182






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 15 hours ago









                  SEpapoulisSEpapoulis

                  463




                  463























                      1














                      You can also use:



                      dfs = [df1, df2, df3]
                      df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
                      for d in dfs[2:]:
                      df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

                      depth VAR1 profile VAR2 VAR3
                      0 0.5 38.196202 profile_1 NaN NaN
                      1 0.6 38.198002 profile_1 0.20440 NaN
                      2 1.3 38.200001 profile_1 NaN 15.182
                      3 1.1 NaN profile_1 0.20442 NaN
                      4 1.2 NaN profile_1 0.20446 15.188
                      5 1.4 NaN profile_1 NaN 15.182





                      share|improve this answer




























                        1














                        You can also use:



                        dfs = [df1, df2, df3]
                        df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
                        for d in dfs[2:]:
                        df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

                        depth VAR1 profile VAR2 VAR3
                        0 0.5 38.196202 profile_1 NaN NaN
                        1 0.6 38.198002 profile_1 0.20440 NaN
                        2 1.3 38.200001 profile_1 NaN 15.182
                        3 1.1 NaN profile_1 0.20442 NaN
                        4 1.2 NaN profile_1 0.20446 15.188
                        5 1.4 NaN profile_1 NaN 15.182





                        share|improve this answer


























                          1












                          1








                          1







                          You can also use:



                          dfs = [df1, df2, df3]
                          df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
                          for d in dfs[2:]:
                          df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

                          depth VAR1 profile VAR2 VAR3
                          0 0.5 38.196202 profile_1 NaN NaN
                          1 0.6 38.198002 profile_1 0.20440 NaN
                          2 1.3 38.200001 profile_1 NaN 15.182
                          3 1.1 NaN profile_1 0.20442 NaN
                          4 1.2 NaN profile_1 0.20446 15.188
                          5 1.4 NaN profile_1 NaN 15.182





                          share|improve this answer













                          You can also use:



                          dfs = [df1, df2, df3]
                          df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
                          for d in dfs[2:]:
                          df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

                          depth VAR1 profile VAR2 VAR3
                          0 0.5 38.196202 profile_1 NaN NaN
                          1 0.6 38.198002 profile_1 0.20440 NaN
                          2 1.3 38.200001 profile_1 NaN 15.182
                          3 1.1 NaN profile_1 0.20442 NaN
                          4 1.2 NaN profile_1 0.20446 15.188
                          5 1.4 NaN profile_1 NaN 15.182






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 15 hours ago









                          heena bawaheena bawa

                          59645




                          59645






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55652704%2fmerge-multiple-dataframes-pandas%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Statuo de Libereco

                              Tanganjiko

                              Liste der Baudenkmäler in Enneberg