Forum Posts Word Count

Debates: 0

Posts: 2,087

3

01.23.2023 11:20AM

3

01.23.2023 11:20AM

So I was thinking forum post count is a bit of an abstraction away from real life. Word count however is something we see in a whole lot of places.

I wrote a script to get word count for people's forum posts. It's not exactly correct. I'm only splitting on spaces, so words split by forward slashes or colons will get by. Also, I suck at web development and the only response I could get from this place was html, so that was a pain. (Was gonna do debates and debate comments and questions too until I realised how much of a pain it was.)

RationalMadman has written 983,946 words on the DebateArt forums. That's roughly equivalent to 14 theses or 11 novels, or the entire Harry Potter series.

That's fun, right.

#2

Debates: 566

Posts: 19,930

10

11

01.23.2023 11:30AM

10

11

01.23.2023 11:30AM

I never related to the way boys or men talk so little when they speak. It's like they say shit in the most basic and dull way possible, I'm rather verbose and I like being it.

That doesn't mean I speak amazingly 'well' to women but the reverse is very true, when I hear the average female preacher or lecturer on a topic I am more likely to comprehend her way of teaching, this happened to me my entire school life through to university and I could never pinpoint why until I realised it's because women word things more strung along where men leave gaps often expecting you to fill in the blanks (men literally talk in bullet points quite often).

#3

Debates: 566

Posts: 19,930

10

11

01.23.2023 11:38AM

10

11

01.23.2023 11:38AM

Not that I am 'embarassed' to post that much, rather I know I do that most places I verbally/textually interact but I must say that you include quotes in that count for sure and links etc. I am not downplaying the count being so high but some of it is due to quoting.

#4

PREZ-HILTON

Debates: 18

Posts: 2,806

3

4

9

PREZ-HILTON

01.23.2023 11:55AM

3

4

9

01.23.2023 11:55AM

-->

@badger

RationalMadman has written 983,946 words on the DebateArt forums

I am working on fixing this by having Mike make the lifetime word count of an individual 500,000.

#5

Debates: 271

Posts: 7,855

4

6

10

01.23.2023 12:44PM

4

6

10

01.23.2023 12:44PM

RationalMadman has written 983,946 words on the DebateArt forums. That's roughly equivalent to 14 theses or 11 novels, or the entire Harry Potter series.

Thats a lot.

#6

Debates: 271

Posts: 7,855

4

6

10

01.23.2023 12:46PM

4

6

10

01.23.2023 12:46PM

Shila would surpass everyone if she continued her 80 posts per day.

#7

Debates: 167

Posts: 3,837

5

8

11

01.23.2023 01:04PM

5

8

11

01.23.2023 01:04PM

-->

@PREZ-HILTON

I am working on fixing this by having Mike make the lifetime word count of an individual 500,000.

So a word limit. Upper bound or lower bound.

If I misunderstood anything, feel free to correct me.

#8

Debates: 167

Posts: 3,837

5

8

11

01.23.2023 01:08PM

5

8

11

01.23.2023 01:08PM

The main point for posting here is definitely not solely posting for posting. On the contrary, that is spam. We are never meant to just post, we see stuff and we present our opinion(s) and that is a post or more. That is how it works.

I suggest the default forums leaderboard ranking should be based on likes/posts ratio. At least clickbaity titles are better than spamming videos on youtube.com.

#9

Debates: 167

Posts: 3,837

5

8

11

01.23.2023 01:12PM

5

8

11

01.23.2023 01:12PM

Actually, having the leaderboard based on the aggregate number of likes is probably better.

#10

Debates: 16

Posts: 1,067

3

4

11

01.23.2023 07:44PM

3

4

11

01.23.2023 07:44PM

-->

@badger

send script

i can help figure out debate comments + questions if u want

Topic's author

#11

Debates: 0

Posts: 2,087

3

01.23.2023 08:26PM

3

01.23.2023 08:26PM

-->

@BearMan

import urllib.request
import re
from bs4 import BeautifulSoup

word_count = 0

def count_words(text):
words = text.split()
return len(words)

def get_post_text(html_string, thread_id, post_id):
soup = BeautifulSoup(html_string, 'html5lib')
post_link = soup.find('a', href=f'/forum/topics/{thread_id}/post-links/{post_id}', rel='nofollow')
post_text_div = post_link.find_next('div', class_='forum-topic-show__post-text', itemprop="text")
i = count_words(post_text_div.text)

return i

# o = urllib.request.urlopen("https://www.debateart.com/participants/RationalMadman/forum_posts")
# b = o.read()
# s = b.decode("utf-8")
# matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)
# for match in matches:
# topic = match[0]
# post = match[1]
# url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)
# o = urllib.request.urlopen(url)
# b = o.read()
# s = b.decode("utf-8")
# html_string = s
# i = get_post_text(html_string, match[0], match[1])
# word_count += i

curr = 859

while urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}"):
o = urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}")
b = o.read()
s = b.decode("utf-8")
matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)
for match in matches:
topic = match[0]
post = match[1]
url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)
o = urllib.request.urlopen(url)
b = o.read()
s = b.decode("utf-8")
html_string = s
i = get_post_text(html_string, match[0], match[1])
word_count += i
print(curr)
print(word_count)
curr += 1

print(word_count)

Just takes too long. Site is all php and html. All you can get back is full page html on every request, then need to search that. 7k lines on every post.

#12

sadolite

Debates: 0

Posts: 2,928

3

2

4

sadolite

01.23.2023 10:38PM

3

2

4

01.23.2023 10:38PM

If you wrote one word every second it would take 11 days to write 983,946 words. With that said, over a few years , Meh.

#13

Debates: 16

Posts: 1,067

3

4

11

01.24.2023 10:53PM

3

4

11

01.24.2023 10:53PM

-->

@badger

github?

indentation is being screwed up

Topic's author

#14

Debates: 0

Posts: 2,087

3

01.24.2023 11:47PM

3

01.24.2023 11:47PM

-->

@BearMan

Simple loops dude. Indent everything under the while loops once. Indent under the for loop once more down until word_count += i. The first commented out bit is to get the first page of comments on user profile. The while loops gets everything else from page=2. curr was set to 800 there because I did it in increments. Set it to 2 to run from beginning.

Topic's author

#15

Debates: 0

Posts: 2,087

3

01.24.2023 11:48PM

3

01.24.2023 11:48PM

Everything under the for loop in the first comment out part is indented once.

Topic's author

#16

Debates: 0

Posts: 2,087

3

01.24.2023 11:52PM

3

01.24.2023 11:52PM

import urllib.request

import re

from bs4 import BeautifulSoup

word_count = 0

def count_words(text):

words = text.split()

return len(words)

def get_post_text(html_string, thread_id, post_id):

soup = BeautifulSoup(html_string, 'html5lib')

post_link = soup.find('a', href=f'/forum/topics/{thread_id}/post-links/{post_id}', rel='nofollow')

post_text_div = post_link.find_next('div', class_='forum-topic-show__post-text', itemprop="text")

i = count_words(post_text_div.text)

return i

# o = urllib.request.urlopen("https://www.debateart.com/participants/RationalMadman/forum_posts")

# b = o.read()

# s = b.decode("utf-8")

# matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)

# for match in matches:

# topic = match[0]

# post = match[1]

# url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)

# o = urllib.request.urlopen(url)

# b = o.read()

# s = b.decode("utf-8")

# html_string = s

# i = get_post_text(html_string, match[0], match[1])

# word_count += i

curr = 859

while urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}"):

o = urllib.request.urlopen(f"https://www.debateart.com/participants/RationalMadman/forum_posts?page={curr}")

b = o.read()

s = b.decode("utf-8")

matches = re.findall("a href=\"/forum/topics/(\\d+)/post-links/(\\d+)", s, re.DOTALL)

for match in matches:

topic = match[0]

post = match[1]

url = "https://www.debateart.com" + "/forum/topics/" + str(topic) + "/post-links/" + str(post)

o = urllib.request.urlopen(url)

b = o.read()

s = b.decode("utf-8")

html_string = s

i = get_post_text(html_string, match[0], match[1])

word_count += i

print(curr)

print(word_count)

curr += 1

print(word_count)

Topic's author

#17

Debates: 0

Posts: 2,087

3