You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Bonface Munyoki 121d955683
web: arxiv: Replace field with "fld"
6 months ago
.github Comment out instructions in issue templates 7 months ago
etc Add "feed-client-url" params 6 months ago
newsfeed newsfeed: update-feed: Remove default parameters 6 months ago
tests test: arxiv-test: Use update "parse-arxiv-search-terms" function 6 months ago
web web: arxiv: Replace field with "fld" 6 months ago
.gitignore Gitignore compiled racket byte-code 1 year ago
LICENSE Initial commit 1 year ago
README.org Replace "racket-minimal" with "racket" guix demo 6 months ago
info.rkt info: New libraries; html-parsing and sxml 6 months ago
main.rkt main: Add arxiv to list of exposed modules 6 months ago

README.org

What is Feed Analyser?

This aggregates content from different channels(for now only twitter and github commits are properly supported) and displays them in one place. In most hackathons, organisations and sometimes events, content spans different places e.g. Discord, slack, IRC, mailing list, issue trackers etc. This project aims to aggregate news from different places and display them in one place.

For now, the project supports twitter and github.

It also allows you to vote on the feeds. You can view a demo here

How does it work?

The program scrapes content from twitter and github and stores the data into redis queues with a score. You can then read from the redis queue.

There's also an in-built provision for voting on queues, whereby each vote updates the zscore of an individual item thereby increasing or decreasing the likelihood of it being displayed on a page.

Why are you storing things to Redis? Why not just generate pages straight from them?

We'd like to extend the program later so that it can provide a source of aggregate data to other programs. To achieve this, you'd need an intermediary data store, and for this, Redis was chosen.

How do I run this?

If you are fetching data from twitter, ensure you have Python3 and twint installed as this is used to fetch data from twitter without using twitter's restrictive API. And abide by the Twitter terms of service and usage policies when using that data!

  • Configure the feed reader

See configuration examples.

  • Start the twitter/ gh fetching daemon:

# Replace example.conf.rkt with your own conf file with the correct settings
racket newsfeed/update-feed.rkt -c etc/biohackathon.conf.rkt

The worker daemon will fetch data from twitter and github on a set time frame as set in the configuration - typically every 6 hours or so - and feed the results to redis. Note the feed-prefix that distinguishes the actual feeds.

  • Run the voting 'server'

The voting server responds to requests from a web service. It serves

# Replace the conf with your own conf file with the correct settings
racket newsfeed/voting-server.rkt -c etc/biohackathon.conf.rkt

Running with a GNU Guix container

Right now, since the feedanalyser uses redis-racket– which throws errors in Racket v7.9, you could use guix time-machine with a channels file to set things up!

First, the channels files looks(tried and tested so it works) something like:

(list
 (channel
  (name 'guix-bioinformatics)
  (url "https://github.com/BonfaceKilz/guix-bioinformatics")
  (branch "master")
  (commit "ad741c19d6a7bfc16610fa25689398243deaa7a5"))
 (channel
  (name 'guix)
  (url "https://git.savannah.gnu.org/git/guix.git")
  (branch "master")
  (commit "1f39175d1a030877b034a0ba85ef94b987b50b3e")))

The above channel definitions contains the working versions of racket and python-twint prior the v7.9 bump in upstream Guix.

After setting up the channels file, create a guix container with racket and other useful tools ("expose" /usr/share/zoneinfo bacause it's required by one of the Racket packages. The -N flag allows you to connect to the internet, and to also connect to redis which runs outside your container):

./pre-inst-env guix time-machine -C channels.scm -- environment -N -C --expose=/usr/share/zoneinfo --ad-hoc racket python-twint nss-certs coreutils

Now export the SSL_CERTS so that you can use Racket over the internet:

export SSL_CERT_DIR="$GUIX_ENVIRONMENT/etc/ssl/certs" && export SSL_CERT_FILE="$GUIX_ENVIRONMENT/etc/ssl/certs/ca-certificates.crt"

Finally, the usual(inside the env):

raco pkg install

Now you can run the feed and associated voting server using the commands:

# Start the feed daemon in the background
racket newsfeed/update-feed.rkt -c etc/biohackathon.conf.rkt &

# Start the voting server. Be sure to configure things properly!
racket newsfeed/voting-server.rkt -c etc/biohackathon.conf.rkt

PS: You could set racket up in it's own profile by running something like:

./pre-inst-env guix time-machine -C channels.scm -- install racket python-twint -p ~/opt/racket

Suggestions/ Help Wanted

  • Fetch data from Slack

  • Come up with an algorithm to sieve out noise(Machine Learning)

  • Expose content using an API

  • Come up with a sane deployment process

RoadMap

  • [x] Better Parsing from Twitter

  • Fetch commits

  • Add package and dependencies to Guix

  • Integrate to GN2

  • Fetch Pull Requests and merges from GitHub and display them

  • Fetch Data from Slack

  • Make a lib layer and add project to upstream to racket-lang.org

  • Fetch Data from IRC

  • Add NLP to make more meaningful data

LICENSE

This tool is published under the GPLv3. See LICENSE.