Count letter frequency in word list, excluding duplicates in the same word
I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.
For example if i have:
words=["tree","bone","indigo","developer"]
The frequency will be:
letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}
As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.
This is the algorithm that I came up with, it's implemented in Python:
for word in words:
count=0;
for letter in word:
if(letter.isalpha()):
if((letters[letter.lower()] > 0 && count == 0) ||
(letters[letter.lower()] == 0 && count == 0)):
letters[letter.lower()]+=1
count=1
elif(letters[letter.lower()]==0 && count==1):
letters[letter.lower()]+=1
But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.
python algorithm
add a comment |
I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.
For example if i have:
words=["tree","bone","indigo","developer"]
The frequency will be:
letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}
As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.
This is the algorithm that I came up with, it's implemented in Python:
for word in words:
count=0;
for letter in word:
if(letter.isalpha()):
if((letters[letter.lower()] > 0 && count == 0) ||
(letters[letter.lower()] == 0 && count == 0)):
letters[letter.lower()]+=1
count=1
elif(letters[letter.lower()]==0 && count==1):
letters[letter.lower()]+=1
But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.
python algorithm
add a comment |
I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.
For example if i have:
words=["tree","bone","indigo","developer"]
The frequency will be:
letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}
As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.
This is the algorithm that I came up with, it's implemented in Python:
for word in words:
count=0;
for letter in word:
if(letter.isalpha()):
if((letters[letter.lower()] > 0 && count == 0) ||
(letters[letter.lower()] == 0 && count == 0)):
letters[letter.lower()]+=1
count=1
elif(letters[letter.lower()]==0 && count==1):
letters[letter.lower()]+=1
But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.
python algorithm
I'm trying to find the most frequent letter in a list of words. I'm struggling with the algorithm because I need to count the letter frequency in a word only once skipping duplicates, so I need help finding a way to count the frequency of the letters in the entire list with only one occurrence per word, ignoring the second occurrence.
For example if i have:
words=["tree","bone","indigo","developer"]
The frequency will be:
letters={a:0, b:1, c:0, d:2, e:3, f:0, g:1, h:0, i:1, j:0, k:0, l:1, m:0, n:2, o:3, p:1, q:0, r:2, s:0, t:1, u:0, v:1, w:0, x:0, y:0, z:0}
As you can see from the letters dictionary: 'e' is 3 and not 5 because if 'e' repeats more than once in the same word it should be ignored.
This is the algorithm that I came up with, it's implemented in Python:
for word in words:
count=0;
for letter in word:
if(letter.isalpha()):
if((letters[letter.lower()] > 0 && count == 0) ||
(letters[letter.lower()] == 0 && count == 0)):
letters[letter.lower()]+=1
count=1
elif(letters[letter.lower()]==0 && count==1):
letters[letter.lower()]+=1
But it still requires work and I can't think about anything else, I'd be glad to anyone who will help me to think about a working solution.
python algorithm
python algorithm
edited 1 hour ago
Prune
43.1k143456
43.1k143456
asked 2 hours ago
MattGeekMattGeek
607
607
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
A variation on @Primusa answer without using update:
from collections import Counter
words = ["tree", "bone", "indigo", "developer"]
counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())
Output
Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})
Basically convert each word to a set and then iterate over each set.
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
add a comment |
Create a counter object and then update it with sets for each word:
from collections import Counter
c = Counter()
for word in wordlist:
c.update(set(word.lower()))
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using aset
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would ratherc.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that
– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
|
show 2 more comments
One without Counter
words=["tree","bone","indigo","developer"]
d={}
for word in words: # iterate over words
for i in set(word): # to remove the duplication of characters within word
d[i]=d.get(i,0)+1
Output
{'b': 1,
'd': 2,
'e': 3,
'g': 1,
'i': 1,
'l': 1,
'n': 2,
'o': 3,
'p': 1,
'r': 2,
't': 1,
'v': 1}
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
add a comment |
Comparing speed of the solutions presented so far:
def f1(words):
c = Counter()
for word in words:
c.update(set(word.lower()))
return c
def f2(words):
return Counter(
c
for word in words
for c in set(word.lower()))
def f3(words):
d = {}
for word in words:
for i in set(word.lower()):
d[i] = d.get(i, 0) + 1
return d
My timing function (using different sizes for the list of words):
word_list = [
'tree', 'bone', 'indigo', 'developer', 'python',
'language', 'timeit', 'xerox', 'printer', 'offset',
]
for exp in range(5):
words = word_list * 10**exp
result_list =
for i in range(1, 4):
t = timeit.timeit(
'f(words)',
'from __main__ import words, f{} as f'.format(i),
number=100)
result_list.append((i, t))
print('{:10,d} words | {}'.format(
len(words),
' | '.join(
'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))
The results:
10 words | f1 0.0028 sec | f2 0.0012 sec | f3 0.0011 sec
100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec
The Counter
with list comprehension (here as f2()
) seems to be the fastest. Using counter.update()
seems to be a slow point (here as f1()
).
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54223703%2fcount-letter-frequency-in-word-list-excluding-duplicates-in-the-same-word%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
A variation on @Primusa answer without using update:
from collections import Counter
words = ["tree", "bone", "indigo", "developer"]
counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())
Output
Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})
Basically convert each word to a set and then iterate over each set.
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
add a comment |
A variation on @Primusa answer without using update:
from collections import Counter
words = ["tree", "bone", "indigo", "developer"]
counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())
Output
Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})
Basically convert each word to a set and then iterate over each set.
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
add a comment |
A variation on @Primusa answer without using update:
from collections import Counter
words = ["tree", "bone", "indigo", "developer"]
counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())
Output
Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})
Basically convert each word to a set and then iterate over each set.
A variation on @Primusa answer without using update:
from collections import Counter
words = ["tree", "bone", "indigo", "developer"]
counts = Counter(c for word in words for c in set(word.lower()) if c.isalpha())
Output
Counter({'e': 3, 'o': 3, 'r': 2, 'd': 2, 'n': 2, 'p': 1, 'i': 1, 'b': 1, 'v': 1, 'g': 1, 'l': 1, 't': 1})
Basically convert each word to a set and then iterate over each set.
edited 31 mins ago
answered 2 hours ago
Daniel MesejoDaniel Mesejo
15.8k21029
15.8k21029
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
add a comment |
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
You need to lowercase the word or it fails when words contain capital letters
– Primusa
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
@Primusa, you were right! Fixed.
– Daniel Mesejo
1 hour ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
Thanks, this solution is most complete one.
– MattGeek
45 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
it's not really complete as long as non-alphabetic characters are counted too
– Walter Tross
44 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
@WalterTross Fixed!
– Daniel Mesejo
30 mins ago
add a comment |
Create a counter object and then update it with sets for each word:
from collections import Counter
c = Counter()
for word in wordlist:
c.update(set(word.lower()))
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using aset
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would ratherc.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that
– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
|
show 2 more comments
Create a counter object and then update it with sets for each word:
from collections import Counter
c = Counter()
for word in wordlist:
c.update(set(word.lower()))
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using aset
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would ratherc.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that
– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
|
show 2 more comments
Create a counter object and then update it with sets for each word:
from collections import Counter
c = Counter()
for word in wordlist:
c.update(set(word.lower()))
Create a counter object and then update it with sets for each word:
from collections import Counter
c = Counter()
for word in wordlist:
c.update(set(word.lower()))
answered 2 hours ago
PrimusaPrimusa
4,9531425
4,9531425
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using aset
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would ratherc.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that
– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
|
show 2 more comments
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using aset
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would ratherc.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that
– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
2
2
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
It would be helpful to compare the time complexity of this solution to the one provided by OP
– Jordan Singer
2 hours ago
2
2
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a
set
– Primusa
1 hour ago
@JordanSinger I think they're the same time complexity, both solutions iterate over every character in every word; mine just screens for duplicates using a
set
– Primusa
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
Right, I suggested that because OP was interested in efficiency.
– Jordan Singer
1 hour ago
I would rather
c.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that– Walter Tross
46 mins ago
I would rather
c.update(filter(lambda x: x.isalpha(), set(word.lower()))
or something like that– Walter Tross
46 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
@WalterTross the question states that the input is a list of words so I didn't consider punctuation or spaces, but did consider capital letters
– Primusa
44 mins ago
|
show 2 more comments
One without Counter
words=["tree","bone","indigo","developer"]
d={}
for word in words: # iterate over words
for i in set(word): # to remove the duplication of characters within word
d[i]=d.get(i,0)+1
Output
{'b': 1,
'd': 2,
'e': 3,
'g': 1,
'i': 1,
'l': 1,
'n': 2,
'o': 3,
'p': 1,
'r': 2,
't': 1,
'v': 1}
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
add a comment |
One without Counter
words=["tree","bone","indigo","developer"]
d={}
for word in words: # iterate over words
for i in set(word): # to remove the duplication of characters within word
d[i]=d.get(i,0)+1
Output
{'b': 1,
'd': 2,
'e': 3,
'g': 1,
'i': 1,
'l': 1,
'n': 2,
'o': 3,
'p': 1,
'r': 2,
't': 1,
'v': 1}
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
add a comment |
One without Counter
words=["tree","bone","indigo","developer"]
d={}
for word in words: # iterate over words
for i in set(word): # to remove the duplication of characters within word
d[i]=d.get(i,0)+1
Output
{'b': 1,
'd': 2,
'e': 3,
'g': 1,
'i': 1,
'l': 1,
'n': 2,
'o': 3,
'p': 1,
'r': 2,
't': 1,
'v': 1}
One without Counter
words=["tree","bone","indigo","developer"]
d={}
for word in words: # iterate over words
for i in set(word): # to remove the duplication of characters within word
d[i]=d.get(i,0)+1
Output
{'b': 1,
'd': 2,
'e': 3,
'g': 1,
'i': 1,
'l': 1,
'n': 2,
'o': 3,
'p': 1,
'r': 2,
't': 1,
'v': 1}
answered 2 hours ago
mad_mad_
3,88011020
3,88011020
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
add a comment |
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
Thanks, for your effort. This might be useful to people who want to implement the algorithm on their own.
– MattGeek
43 mins ago
add a comment |
Comparing speed of the solutions presented so far:
def f1(words):
c = Counter()
for word in words:
c.update(set(word.lower()))
return c
def f2(words):
return Counter(
c
for word in words
for c in set(word.lower()))
def f3(words):
d = {}
for word in words:
for i in set(word.lower()):
d[i] = d.get(i, 0) + 1
return d
My timing function (using different sizes for the list of words):
word_list = [
'tree', 'bone', 'indigo', 'developer', 'python',
'language', 'timeit', 'xerox', 'printer', 'offset',
]
for exp in range(5):
words = word_list * 10**exp
result_list =
for i in range(1, 4):
t = timeit.timeit(
'f(words)',
'from __main__ import words, f{} as f'.format(i),
number=100)
result_list.append((i, t))
print('{:10,d} words | {}'.format(
len(words),
' | '.join(
'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))
The results:
10 words | f1 0.0028 sec | f2 0.0012 sec | f3 0.0011 sec
100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec
The Counter
with list comprehension (here as f2()
) seems to be the fastest. Using counter.update()
seems to be a slow point (here as f1()
).
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
add a comment |
Comparing speed of the solutions presented so far:
def f1(words):
c = Counter()
for word in words:
c.update(set(word.lower()))
return c
def f2(words):
return Counter(
c
for word in words
for c in set(word.lower()))
def f3(words):
d = {}
for word in words:
for i in set(word.lower()):
d[i] = d.get(i, 0) + 1
return d
My timing function (using different sizes for the list of words):
word_list = [
'tree', 'bone', 'indigo', 'developer', 'python',
'language', 'timeit', 'xerox', 'printer', 'offset',
]
for exp in range(5):
words = word_list * 10**exp
result_list =
for i in range(1, 4):
t = timeit.timeit(
'f(words)',
'from __main__ import words, f{} as f'.format(i),
number=100)
result_list.append((i, t))
print('{:10,d} words | {}'.format(
len(words),
' | '.join(
'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))
The results:
10 words | f1 0.0028 sec | f2 0.0012 sec | f3 0.0011 sec
100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec
The Counter
with list comprehension (here as f2()
) seems to be the fastest. Using counter.update()
seems to be a slow point (here as f1()
).
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
add a comment |
Comparing speed of the solutions presented so far:
def f1(words):
c = Counter()
for word in words:
c.update(set(word.lower()))
return c
def f2(words):
return Counter(
c
for word in words
for c in set(word.lower()))
def f3(words):
d = {}
for word in words:
for i in set(word.lower()):
d[i] = d.get(i, 0) + 1
return d
My timing function (using different sizes for the list of words):
word_list = [
'tree', 'bone', 'indigo', 'developer', 'python',
'language', 'timeit', 'xerox', 'printer', 'offset',
]
for exp in range(5):
words = word_list * 10**exp
result_list =
for i in range(1, 4):
t = timeit.timeit(
'f(words)',
'from __main__ import words, f{} as f'.format(i),
number=100)
result_list.append((i, t))
print('{:10,d} words | {}'.format(
len(words),
' | '.join(
'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))
The results:
10 words | f1 0.0028 sec | f2 0.0012 sec | f3 0.0011 sec
100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec
The Counter
with list comprehension (here as f2()
) seems to be the fastest. Using counter.update()
seems to be a slow point (here as f1()
).
Comparing speed of the solutions presented so far:
def f1(words):
c = Counter()
for word in words:
c.update(set(word.lower()))
return c
def f2(words):
return Counter(
c
for word in words
for c in set(word.lower()))
def f3(words):
d = {}
for word in words:
for i in set(word.lower()):
d[i] = d.get(i, 0) + 1
return d
My timing function (using different sizes for the list of words):
word_list = [
'tree', 'bone', 'indigo', 'developer', 'python',
'language', 'timeit', 'xerox', 'printer', 'offset',
]
for exp in range(5):
words = word_list * 10**exp
result_list =
for i in range(1, 4):
t = timeit.timeit(
'f(words)',
'from __main__ import words, f{} as f'.format(i),
number=100)
result_list.append((i, t))
print('{:10,d} words | {}'.format(
len(words),
' | '.join(
'f{} {:8.4f} sec'.format(i, t) for i, t in result_list)))
The results:
10 words | f1 0.0028 sec | f2 0.0012 sec | f3 0.0011 sec
100 words | f1 0.0245 sec | f2 0.0082 sec | f3 0.0113 sec
1,000 words | f1 0.2450 sec | f2 0.0812 sec | f3 0.1134 sec
10,000 words | f1 2.4601 sec | f2 0.8113 sec | f3 1.1335 sec
100,000 words | f1 24.4195 sec | f2 8.1828 sec | f3 11.2167 sec
The Counter
with list comprehension (here as f2()
) seems to be the fastest. Using counter.update()
seems to be a slow point (here as f1()
).
edited 51 mins ago
answered 1 hour ago
RalfRalf
5,0284933
5,0284933
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
add a comment |
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
@Primusa ups, my bad. I updated with new results, but the conclusion is the same...
– Ralf
51 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
Thanks for this good comparison.
– MattGeek
46 mins ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54223703%2fcount-letter-frequency-in-word-list-excluding-duplicates-in-the-same-word%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown