Interdisciplinary projects

Interdisciplinary projects

par Tomaso Erseghe,
Nombre de réponses : 0

Dear all, 

from the IP groups excel sheet it seems that all the groups are covered, but your help and effort are needed to check that everything is going in the right direction. 

So, meet asap with you SNA colleagues, discuss the project together, start downloading the data, and check that the data you get is appropriate for your study. Ideally, you should get all the data in 7/10 days, then you can concentrate on the analysis. In this initial downloading part of the work everyone should collectively participate, while later on in the analysis part each participant should be assigned a different duty.

Remember that what is required from you is

  • to develop your algorithms (in this respect it will be important that each of you is assigned a different aspect, to be discussed with the group... and not including data downloading), and
  • to provide pleasant and meaningful visual representations in the form of plots and graphs.
  • For the final exam you will also be required to submit your code, 
  • to add in the final report a description of the algorithms you used (while all the rest of the report is on the SNA students side), 
  • to help preparing the slides, and 
  • to present orally the part you developed (there will be a unique group presentation where SNA students will introduce the project, you will describe the anlytical/programming part, and they will draw the conclusions... you will have 5 minutes each for the oral presentation, so, for example, a group of 8 people will have 40 minutes for presenting their project). 

There is currently a small issue with the downloads from Reddit, which was kindly pointed out by Giacomo Dandolo: it seems that Reddit does not allow you anymore to build new apps (you can ask Reddit, but we do not know their timing nor if the request will be accepted), so if you do not have an active app use the credentials given in the lab. In case more are needed, I can provide a couple of additional credentials... simply ask me. Reddit also updated the maximum amount of data that you can download, it was 1000 posts per search but now it is 250, but maybe you are able to download in any case a sufficient amount of data... give it a try.

For those of you who know R, there is a package for Reddit there (see the attached file), it might help.

Also, Giacomo found public datasets (scraped from 2005 to 2024), which contain basically everything (submissions, which are posts, and comments) up until that date, and which could prove useful without the requirement of scraping data directly from Reddit. The problem is that there is a need to torrent them, and after doing that the obtained file .zst needs to be decoded using these scripts. In case you are going for this option, please be so kind to exchange your knowledge with other NS students to speed up things.

Best,

TE