Count letter frequency in word list, excluding duplicates in the same word












6















I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.



For example if i have:



words=["tree","bone","indigo","developer"]


The frequency will be:



letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}


As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.



This is the algorithm that I came up with, it's implemented in Python:



for word in words:
count=0;

for letter in word:
if(letter.isalpha()):
if((letters[letter.lower()] > 0 && count == 0) ||
(letters[letter.lower()] == 0 && count == 0)):

letters[letter.lower()]+=1
count=1

elif(letters[letter.lower()]==0 && count==1):
letters[letter.lower()]+=1


But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.










share|improve this question





























    6















    I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.



    For example if i have:



    words=["tree","bone","indigo","developer"]


    The frequency will be:



    letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}


    As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.



    This is the algorithm that I came up with, it's implemented in Python:



    for word in words:
    count=0;

    for letter in word:
    if(letter.isalpha()):
    if((letters[letter.lower()] > 0 && count == 0) ||
    (letters[letter.lower()] == 0 && count == 0)):

    letters[letter.lower()]+=1
    count=1

    elif(letters[letter.lower()]==0 && count==1):
    letters[letter.lower()]+=1


    But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.










    share|improve this question



























      6












      6








      6


      1






      I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.



      For example if i have:



      words=["tree","bone","indigo","developer"]


      The frequency will be:



      letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}


      As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.



      This is the algorithm that I came up with, it's implemented in Python:



      for word in words:
      count=0;

      for letter in word:
      if(letter.isalpha()):
      if((letters[letter.lower()] > 0 && count == 0) ||
      (letters[letter.lower()] == 0 && count == 0)):

      letters[letter.lower()]+=1
      count=1

      elif(letters[letter.lower()]==0 && count==1):
      letters[letter.lower()]+=1


      But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.










      share|improve this question
















      I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.



      For example if i have:



      words=["tree","bone","indigo","developer"]


      The frequency will be:



      letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}


      As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.



      This is the algorithm that I came up with, it's implemented in Python:



      for word in words:
      count=0;

      for letter in word:
      if(letter.isalpha()):
      if((letters[letter.lower()] > 0 && count == 0) ||
      (letters[letter.lower()] == 0 && count == 0)):

      letters[letter.lower()]+=1
      count=1

      elif(letters[letter.lower()]==0 && count==1):
      letters[letter.lower()]+=1


      But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.







      python algorithm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago









      Prune

      43.1k143456




      43.1k143456










      asked 2 hours ago









      MattGeekMattGeek

      607




      607
























          4 Answers
          4






          active

          oldest

          votes


















          9














          A variation on @Primusa answer without using update:



          from collections import Counter

          words = ["tree", "bone", "indigo", "developer"]
          counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())


          Output



          Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})


          Basically convert each word to a set and then iterate over each set.






          share|improve this answer


























          • You need to lowercase the word or it fails when words contain capital letters

            – Primusa
            1 hour ago











          • @Primusa, you were right! Fixed.

            – Daniel Mesejo
            1 hour ago











          • Thanks, this solution is most complete one.

            – MattGeek
            45 mins ago











          • it's not really complete as long as non-alphabetic characters are counted too

            – Walter Tross
            44 mins ago











          • @WalterTross Fixed!

            – Daniel Mesejo
            30 mins ago



















          7














          Create a counter object and then update it with sets for each word:



          from collections import Counter
          c = Counter()

          for word in wordlist:
          c.update(set(word.lower()))





          share|improve this answer



















          • 2





            It would be helpful to compare the time complexity of this solution to the one provided by OP

            – Jordan Singer
            2 hours ago






          • 2





            @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

            – Primusa
            1 hour ago











          • Right, I suggested that because OP was interested in efficiency.

            – Jordan Singer
            1 hour ago











          • I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

            – Walter Tross
            46 mins ago











          • @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

            – Primusa
            44 mins ago



















          5














          One without Counter



          words=["tree","bone","indigo","developer"]
          d={}
          for word in words: # iterate over words
          for i in set(word): # to remove the duplication of characters within word
          d[i]=d.get(i,0)+1


          Output



          {'b': 1,
          'd': 2,
          'e': 3,
          'g': 1,
          'i': 1,
          'l': 1,
          'n': 2,
          'o': 3,
          'p': 1,
          'r': 2,
          't': 1,
          'v': 1}





          share|improve this answer
























          • Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

            – MattGeek
            43 mins ago



















          1














          Comparing speed of the solutions presented so far:



          def f1(words):
          c = Counter()
          for word in words:
          c.update(set(word.lower()))
          return c

          def f2(words):
          return Counter(
          c
          for word in words
          for c in set(word.lower()))

          def f3(words):
          d = {}
          for word in words:
          for i in set(word.lower()):
          d[i] = d.get(i, 0) + 1
          return d


          My timing function (using different sizes for the list of words):



          word_list = [
          'tree', 'bone', 'indigo', 'developer', 'python',
          'language', 'timeit', 'xerox', 'printer', 'offset',
          ]

          for exp in range(5):
          words = word_list * 10**exp

          result_list =
          for i in range(1, 4):
          t = timeit.timeit(
          'f(words)',
          'from __main__ import words, f{} as f'.format(i),
          number=100)
          result_list.append((i, t))

          print('{:10,d} words | {}'.format(
          len(words),
          ' | '.join(
          'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))


          The results:



                  10 words | f1   0.0028 sec | f2   0.0012 sec | f3   0.0011 sec
          100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
          1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
          10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
          100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec


          The Counter with list comprehension (here as f2()) seems to be the fastest. Using counter.update() seems to be a slow point (here as f1()).






          share|improve this answer


























          • @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

            – Ralf
            51 mins ago











          • Thanks for this good comparison.

            – MattGeek
            46 mins ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54223703%2fcount-letter-frequency-in-word-list-excluding-duplicates-in-the-same-word%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          9














          A variation on @Primusa answer without using update:



          from collections import Counter

          words = ["tree", "bone", "indigo", "developer"]
          counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())


          Output



          Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})


          Basically convert each word to a set and then iterate over each set.






          share|improve this answer


























          • You need to lowercase the word or it fails when words contain capital letters

            – Primusa
            1 hour ago











          • @Primusa, you were right! Fixed.

            – Daniel Mesejo
            1 hour ago











          • Thanks, this solution is most complete one.

            – MattGeek
            45 mins ago











          • it's not really complete as long as non-alphabetic characters are counted too

            – Walter Tross
            44 mins ago











          • @WalterTross Fixed!

            – Daniel Mesejo
            30 mins ago
















          9














          A variation on @Primusa answer without using update:



          from collections import Counter

          words = ["tree", "bone", "indigo", "developer"]
          counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())


          Output



          Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})


          Basically convert each word to a set and then iterate over each set.






          share|improve this answer


























          • You need to lowercase the word or it fails when words contain capital letters

            – Primusa
            1 hour ago











          • @Primusa, you were right! Fixed.

            – Daniel Mesejo
            1 hour ago











          • Thanks, this solution is most complete one.

            – MattGeek
            45 mins ago











          • it's not really complete as long as non-alphabetic characters are counted too

            – Walter Tross
            44 mins ago











          • @WalterTross Fixed!

            – Daniel Mesejo
            30 mins ago














          9












          9








          9







          A variation on @Primusa answer without using update:



          from collections import Counter

          words = ["tree", "bone", "indigo", "developer"]
          counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())


          Output



          Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})


          Basically convert each word to a set and then iterate over each set.






          share|improve this answer















          A variation on @Primusa answer without using update:



          from collections import Counter

          words = ["tree", "bone", "indigo", "developer"]
          counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())


          Output



          Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})


          Basically convert each word to a set and then iterate over each set.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 31 mins ago

























          answered 2 hours ago









          Daniel MesejoDaniel Mesejo

          15.8k21029




          15.8k21029













          • You need to lowercase the word or it fails when words contain capital letters

            – Primusa
            1 hour ago











          • @Primusa, you were right! Fixed.

            – Daniel Mesejo
            1 hour ago











          • Thanks, this solution is most complete one.

            – MattGeek
            45 mins ago











          • it's not really complete as long as non-alphabetic characters are counted too

            – Walter Tross
            44 mins ago











          • @WalterTross Fixed!

            – Daniel Mesejo
            30 mins ago



















          • You need to lowercase the word or it fails when words contain capital letters

            – Primusa
            1 hour ago











          • @Primusa, you were right! Fixed.

            – Daniel Mesejo
            1 hour ago











          • Thanks, this solution is most complete one.

            – MattGeek
            45 mins ago











          • it's not really complete as long as non-alphabetic characters are counted too

            – Walter Tross
            44 mins ago











          • @WalterTross Fixed!

            – Daniel Mesejo
            30 mins ago

















          You need to lowercase the word or it fails when words contain capital letters

          – Primusa
          1 hour ago





          You need to lowercase the word or it fails when words contain capital letters

          – Primusa
          1 hour ago













          @Primusa, you were right! Fixed.

          – Daniel Mesejo
          1 hour ago





          @Primusa, you were right! Fixed.

          – Daniel Mesejo
          1 hour ago













          Thanks, this solution is most complete one.

          – MattGeek
          45 mins ago





          Thanks, this solution is most complete one.

          – MattGeek
          45 mins ago













          it's not really complete as long as non-alphabetic characters are counted too

          – Walter Tross
          44 mins ago





          it's not really complete as long as non-alphabetic characters are counted too

          – Walter Tross
          44 mins ago













          @WalterTross Fixed!

          – Daniel Mesejo
          30 mins ago





          @WalterTross Fixed!

          – Daniel Mesejo
          30 mins ago













          7














          Create a counter object and then update it with sets for each word:



          from collections import Counter
          c = Counter()

          for word in wordlist:
          c.update(set(word.lower()))





          share|improve this answer



















          • 2





            It would be helpful to compare the time complexity of this solution to the one provided by OP

            – Jordan Singer
            2 hours ago






          • 2





            @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

            – Primusa
            1 hour ago











          • Right, I suggested that because OP was interested in efficiency.

            – Jordan Singer
            1 hour ago











          • I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

            – Walter Tross
            46 mins ago











          • @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

            – Primusa
            44 mins ago
















          7














          Create a counter object and then update it with sets for each word:



          from collections import Counter
          c = Counter()

          for word in wordlist:
          c.update(set(word.lower()))





          share|improve this answer



















          • 2





            It would be helpful to compare the time complexity of this solution to the one provided by OP

            – Jordan Singer
            2 hours ago






          • 2





            @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

            – Primusa
            1 hour ago











          • Right, I suggested that because OP was interested in efficiency.

            – Jordan Singer
            1 hour ago











          • I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

            – Walter Tross
            46 mins ago











          • @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

            – Primusa
            44 mins ago














          7












          7








          7







          Create a counter object and then update it with sets for each word:



          from collections import Counter
          c = Counter()

          for word in wordlist:
          c.update(set(word.lower()))





          share|improve this answer













          Create a counter object and then update it with sets for each word:



          from collections import Counter
          c = Counter()

          for word in wordlist:
          c.update(set(word.lower()))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          PrimusaPrimusa

          4,9531425




          4,9531425








          • 2





            It would be helpful to compare the time complexity of this solution to the one provided by OP

            – Jordan Singer
            2 hours ago






          • 2





            @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

            – Primusa
            1 hour ago











          • Right, I suggested that because OP was interested in efficiency.

            – Jordan Singer
            1 hour ago











          • I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

            – Walter Tross
            46 mins ago











          • @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

            – Primusa
            44 mins ago














          • 2





            It would be helpful to compare the time complexity of this solution to the one provided by OP

            – Jordan Singer
            2 hours ago






          • 2





            @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

            – Primusa
            1 hour ago











          • Right, I suggested that because OP was interested in efficiency.

            – Jordan Singer
            1 hour ago











          • I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

            – Walter Tross
            46 mins ago











          • @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

            – Primusa
            44 mins ago








          2




          2





          It would be helpful to compare the time complexity of this solution to the one provided by OP

          – Jordan Singer
          2 hours ago





          It would be helpful to compare the time complexity of this solution to the one provided by OP

          – Jordan Singer
          2 hours ago




          2




          2





          @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

          – Primusa
          1 hour ago





          @JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a set

          – Primusa
          1 hour ago













          Right, I suggested that because OP was interested in efficiency.

          – Jordan Singer
          1 hour ago





          Right, I suggested that because OP was interested in efficiency.

          – Jordan Singer
          1 hour ago













          I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

          – Walter Tross
          46 mins ago





          I would rather c.update(filter(lambda x: x.isalpha(), set(word.lower())) or something like that

          – Walter Tross
          46 mins ago













          @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

          – Primusa
          44 mins ago





          @WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters

          – Primusa
          44 mins ago











          5














          One without Counter



          words=["tree","bone","indigo","developer"]
          d={}
          for word in words: # iterate over words
          for i in set(word): # to remove the duplication of characters within word
          d[i]=d.get(i,0)+1


          Output



          {'b': 1,
          'd': 2,
          'e': 3,
          'g': 1,
          'i': 1,
          'l': 1,
          'n': 2,
          'o': 3,
          'p': 1,
          'r': 2,
          't': 1,
          'v': 1}





          share|improve this answer
























          • Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

            – MattGeek
            43 mins ago
















          5














          One without Counter



          words=["tree","bone","indigo","developer"]
          d={}
          for word in words: # iterate over words
          for i in set(word): # to remove the duplication of characters within word
          d[i]=d.get(i,0)+1


          Output



          {'b': 1,
          'd': 2,
          'e': 3,
          'g': 1,
          'i': 1,
          'l': 1,
          'n': 2,
          'o': 3,
          'p': 1,
          'r': 2,
          't': 1,
          'v': 1}





          share|improve this answer
























          • Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

            – MattGeek
            43 mins ago














          5












          5








          5







          One without Counter



          words=["tree","bone","indigo","developer"]
          d={}
          for word in words: # iterate over words
          for i in set(word): # to remove the duplication of characters within word
          d[i]=d.get(i,0)+1


          Output



          {'b': 1,
          'd': 2,
          'e': 3,
          'g': 1,
          'i': 1,
          'l': 1,
          'n': 2,
          'o': 3,
          'p': 1,
          'r': 2,
          't': 1,
          'v': 1}





          share|improve this answer













          One without Counter



          words=["tree","bone","indigo","developer"]
          d={}
          for word in words: # iterate over words
          for i in set(word): # to remove the duplication of characters within word
          d[i]=d.get(i,0)+1


          Output



          {'b': 1,
          'd': 2,
          'e': 3,
          'g': 1,
          'i': 1,
          'l': 1,
          'n': 2,
          'o': 3,
          'p': 1,
          'r': 2,
          't': 1,
          'v': 1}






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          mad_mad_

          3,88011020




          3,88011020













          • Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

            – MattGeek
            43 mins ago



















          • Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

            – MattGeek
            43 mins ago

















          Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

          – MattGeek
          43 mins ago





          Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.

          – MattGeek
          43 mins ago











          1














          Comparing speed of the solutions presented so far:



          def f1(words):
          c = Counter()
          for word in words:
          c.update(set(word.lower()))
          return c

          def f2(words):
          return Counter(
          c
          for word in words
          for c in set(word.lower()))

          def f3(words):
          d = {}
          for word in words:
          for i in set(word.lower()):
          d[i] = d.get(i, 0) + 1
          return d


          My timing function (using different sizes for the list of words):



          word_list = [
          'tree', 'bone', 'indigo', 'developer', 'python',
          'language', 'timeit', 'xerox', 'printer', 'offset',
          ]

          for exp in range(5):
          words = word_list * 10**exp

          result_list =
          for i in range(1, 4):
          t = timeit.timeit(
          'f(words)',
          'from __main__ import words, f{} as f'.format(i),
          number=100)
          result_list.append((i, t))

          print('{:10,d} words | {}'.format(
          len(words),
          ' | '.join(
          'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))


          The results:



                  10 words | f1   0.0028 sec | f2   0.0012 sec | f3   0.0011 sec
          100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
          1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
          10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
          100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec


          The Counter with list comprehension (here as f2()) seems to be the fastest. Using counter.update() seems to be a slow point (here as f1()).






          share|improve this answer


























          • @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

            – Ralf
            51 mins ago











          • Thanks for this good comparison.

            – MattGeek
            46 mins ago
















          1














          Comparing speed of the solutions presented so far:



          def f1(words):
          c = Counter()
          for word in words:
          c.update(set(word.lower()))
          return c

          def f2(words):
          return Counter(
          c
          for word in words
          for c in set(word.lower()))

          def f3(words):
          d = {}
          for word in words:
          for i in set(word.lower()):
          d[i] = d.get(i, 0) + 1
          return d


          My timing function (using different sizes for the list of words):



          word_list = [
          'tree', 'bone', 'indigo', 'developer', 'python',
          'language', 'timeit', 'xerox', 'printer', 'offset',
          ]

          for exp in range(5):
          words = word_list * 10**exp

          result_list =
          for i in range(1, 4):
          t = timeit.timeit(
          'f(words)',
          'from __main__ import words, f{} as f'.format(i),
          number=100)
          result_list.append((i, t))

          print('{:10,d} words | {}'.format(
          len(words),
          ' | '.join(
          'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))


          The results:



                  10 words | f1   0.0028 sec | f2   0.0012 sec | f3   0.0011 sec
          100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
          1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
          10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
          100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec


          The Counter with list comprehension (here as f2()) seems to be the fastest. Using counter.update() seems to be a slow point (here as f1()).






          share|improve this answer


























          • @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

            – Ralf
            51 mins ago











          • Thanks for this good comparison.

            – MattGeek
            46 mins ago














          1












          1








          1







          Comparing speed of the solutions presented so far:



          def f1(words):
          c = Counter()
          for word in words:
          c.update(set(word.lower()))
          return c

          def f2(words):
          return Counter(
          c
          for word in words
          for c in set(word.lower()))

          def f3(words):
          d = {}
          for word in words:
          for i in set(word.lower()):
          d[i] = d.get(i, 0) + 1
          return d


          My timing function (using different sizes for the list of words):



          word_list = [
          'tree', 'bone', 'indigo', 'developer', 'python',
          'language', 'timeit', 'xerox', 'printer', 'offset',
          ]

          for exp in range(5):
          words = word_list * 10**exp

          result_list =
          for i in range(1, 4):
          t = timeit.timeit(
          'f(words)',
          'from __main__ import words, f{} as f'.format(i),
          number=100)
          result_list.append((i, t))

          print('{:10,d} words | {}'.format(
          len(words),
          ' | '.join(
          'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))


          The results:



                  10 words | f1   0.0028 sec | f2   0.0012 sec | f3   0.0011 sec
          100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
          1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
          10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
          100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec


          The Counter with list comprehension (here as f2()) seems to be the fastest. Using counter.update() seems to be a slow point (here as f1()).






          share|improve this answer















          Comparing speed of the solutions presented so far:



          def f1(words):
          c = Counter()
          for word in words:
          c.update(set(word.lower()))
          return c

          def f2(words):
          return Counter(
          c
          for word in words
          for c in set(word.lower()))

          def f3(words):
          d = {}
          for word in words:
          for i in set(word.lower()):
          d[i] = d.get(i, 0) + 1
          return d


          My timing function (using different sizes for the list of words):



          word_list = [
          'tree', 'bone', 'indigo', 'developer', 'python',
          'language', 'timeit', 'xerox', 'printer', 'offset',
          ]

          for exp in range(5):
          words = word_list * 10**exp

          result_list =
          for i in range(1, 4):
          t = timeit.timeit(
          'f(words)',
          'from __main__ import words, f{} as f'.format(i),
          number=100)
          result_list.append((i, t))

          print('{:10,d} words | {}'.format(
          len(words),
          ' | '.join(
          'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))


          The results:



                  10 words | f1   0.0028 sec | f2   0.0012 sec | f3   0.0011 sec
          100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
          1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
          10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
          100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec


          The Counter with list comprehension (here as f2()) seems to be the fastest. Using counter.update() seems to be a slow point (here as f1()).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 51 mins ago

























          answered 1 hour ago









          RalfRalf

          5,0284933




          5,0284933













          • @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

            – Ralf
            51 mins ago











          • Thanks for this good comparison.

            – MattGeek
            46 mins ago



















          • @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

            – Ralf
            51 mins ago











          • Thanks for this good comparison.

            – MattGeek
            46 mins ago

















          @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

          – Ralf
          51 mins ago





          @Primusa ups, my bad. I updated with new results, but the conclusion is the same...

          – Ralf
          51 mins ago













          Thanks for this good comparison.

          – MattGeek
          46 mins ago





          Thanks for this good comparison.

          – MattGeek
          46 mins ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54223703%2fcount-letter-frequency-in-word-list-excluding-duplicates-in-the-same-word%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Reichsarbeitsdienst

          Tanganjiko

          Norda sulo