Hockey
Club 4000 Member
Posts: 4,537
|
Post by Hockey on Jul 16, 2019 4:10:25 GMT
I was searching the forums for some old posts, and I got to reading them. I thought it was super cool how you could see my writing and personality change through the years. It kind of hit me then that we should save all of this. Proboards WILL eventually close. This is 7 years of rich history that we should preserve. Video and I spoke about how we could do this.
This is our current idea for doing it:
The archive would be a publicly viewable webpage that is searchable and browse-able similar to the current forums.
We will create a database that contains every single thread, post, and associated metadata. We will also have a storage server that has things like avatars, important attachments (no videos), and signatures.
This project would involve several different development cycles:
Firstly, we have to find out the most efficient way to store all of this data for searching/displaying. Then, we need to write an API to write to our database. Additionally, we need to write a browser extension or crawler to archive the data. Video and I figured out that we can crawl the site by simple incrementing thread numbers. Additionally, if we use a program like wget, we (theoretically) can feed it a privelleged session ID (for admin threads, admin threads would become available after permission from Seth) and access the html for that thread. If Cloudflare screws us, we can always write a browser extension instead of an automated crawler.
Once the crawler/browser extension is written, we have to begin the archiving. If a browser extension is used, each archived post will be marked with an archived-by-user stamp so that a malicious contributor can have all archived posts dropped.
Finally, we need to write a web app to display the data.
This is an idea right now. If the idea became developed enough, we could start pursuing it. It's a huge project but it would be soooooooo cool. Anyone have ideas on better ways to archive the data?
(Also, storage is not really a concern. If every post was this long (which they're not), it would only take about 2 gigabytes to store everything. 2-3KB * 600,000)
|
|
|
Post by Polaris Seltzeris on Jul 16, 2019 4:59:05 GMT
I like the idea. The only concern would be who has access to this archive considering it would presumably contain all admin threads, including the threads only accessible to forum administrators (if I recall correctly, there is a recycle bin category for 'deleted' threads and a forum admin lounge).
I also doubt that ProBoards will one day simply cease existing. Usually what happens with these types of companies is that they get integrated into larger companies, but if ProBoards went under and was no longer profitable, most likely they would at the very least allow you to download this data.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jul 16, 2019 6:00:43 GMT
go ahead
|
|
StevenNL2000
Forum Admin
Posts: 6,415
| Likes: 6,936
IGN: StevenNL2000
Timezone: UTC+01:00
Member is Staff. Need immediate assistance? Send a PM
|
Post by StevenNL2000 on Jul 16, 2019 10:36:07 GMT
I've looked into this before. What you want to do is a violation of the ProBoards TOS and will get our forum banned, i.e. we lose all our data right now instead of when ProBoards dies: www.proboards.com/tos
|
|
xfilez
Veteran Member
paint me like one of your french girls
Posts: 2,667
| Likes: 3,303
|
Post by xfilez on Jul 16, 2019 11:22:50 GMT
god damnit proboards is such a buzzkill
|
|
|
Post by Polaris Seltzeris on Jul 16, 2019 16:22:09 GMT
god damnit proboards is such a buzzkill Actually, that's an incredibly reasonable thing for a service like this to have in their TOS. Especially the first part which is more focused on user privacy.
|
|
Tobi
Veteran Member
Retired, currently undergoing health issues.
Posts: 497
| Likes: 178
|
Post by Tobi on Jul 16, 2019 16:32:04 GMT
We should do the same for the server considering it turns 10 in about 16 months.
|
|
tozzit
Veteran Member
Posts: 2,329
| Likes: 1,709
|
Post by tozzit on Jul 16, 2019 18:14:43 GMT
god damnit proboards is such a buzzkill Actually, that's an incredibly reasonable thing for a service like this to have in their TOS. Especially the first part which is more focused on user privacy. Stop using logic godamnit
|
|
|
Post by Polaris Seltzeris on Jul 16, 2019 18:29:18 GMT
Actually, that's an incredibly reasonable thing for a service like this to have in their TOS. Especially the first part which is more focused on user privacy. Stop using logic godamnit I just shit myself.
|
|
|
Post by awesomelink234 on Jul 16, 2019 18:47:09 GMT
Hell yeah that sounds great
|
|
xfilez
Veteran Member
paint me like one of your french girls
Posts: 2,667
| Likes: 3,303
|
Post by xfilez on Jul 16, 2019 20:11:43 GMT
god damnit proboards is such a buzzkill Actually, that's an incredibly reasonable thing for a service like this to have in their TOS. Especially the first part which is more focused on user privacy. Oh, I think we misunderstand each other. I do agree that its a reasonable thing to include in their service - user privacy reasons, as you've touched on -, however, that same thing prevents us from archiving the history of TF on this forum
|
|
|
Post by Polaris Seltzeris on Jul 16, 2019 20:43:25 GMT
Actually, that's an incredibly reasonable thing for a service like this to have in their TOS. Especially the first part which is more focused on user privacy. Oh, I think we misunderstand each other. I do agree that its a reasonable thing to include in their service - user privacy reasons, as you've touched on -, however, that same thing prevents us from archiving the history of TF on this forum The other part of the TOS mentions specifically spiders, robots, avatars, or intelligent agents to collect/harvest information.. They have every reason to do this, including for privacy reasons, as they don't want web crawlers data mining their services. We wouldn't want third party web crawlers data mining us, and I doubt most forums that use ProBoards would either.
|
|
Video
Forum Admin
An op's rights activist
Posts: 5,585
| Likes: 5,893
IGN: VideoGameSmash12, videogamesm12
Old IGN: https://namemc.com/profile/VideoGameSmash12.2, https://namemc.com/profile/videogamesm12.1
Discord: Video#9801
Birthdate (MM/DD): 07/16
Timezone: UTC-07:00
Member is Staff. Need immediate assistance? Send a PM
|
Post by Video on Jul 16, 2019 20:47:28 GMT
We should do the same for the server considering it turns 10 in about 16 months. We do already.
|
|
Hockey
Club 4000 Member
Posts: 4,537
|
Post by Hockey on Jul 17, 2019 4:02:53 GMT
I've looked into this before. What you want to do is a violation of the ProBoards TOS and will get our forum banned, i.e. we lose all our data right now instead of when ProBoards dies: www.proboards.com/tosI haven't studied up on that part of the TOS recently, but if I had to guess, only the user scraping would be banned. It would be silly to ban an entire forums because a malicious actor decided to scrape the boards.
|
|
StevenNL2000
Forum Admin
Posts: 6,415
| Likes: 6,936
IGN: StevenNL2000
Timezone: UTC+01:00
Member is Staff. Need immediate assistance? Send a PM
|
Post by StevenNL2000 on Jul 17, 2019 12:10:12 GMT
I've looked into this before. What you want to do is a violation of the ProBoards TOS and will get our forum banned, i.e. we lose all our data right now instead of when ProBoards dies: www.proboards.com/tosI haven't studied up on that part of the TOS recently, but if I had to guess, only the user scraping would be banned. It would be silly to ban an entire forums because a malicious actor decided to scrape the boards. It would be silly, but unfortunately we don't get to dispose of specific terms based on whether we think they are silly or not. I think this part of the terms was specifically written against your goal, namely making the entire forum content available outside of ProBoards. I mean, I don't think they would notice if you used a tool to export something like 1 thread per hour, but then you will be caught up to today in November 2026.
|
|