CiteULike datasets

Description

From the data page:

[HTML_REMOVED] [HTML_REMOVED]Who-posted-what data[HTML_REMOVED]

The latest data snapshot can always be downloaded at http://static.citeulike.org/data/current.bz2

Older datasets are available on a daily basis and can be found at URLs of the form http://static.citeulike.org/data/2007-05-30.bz2

Data is available from [HTML_REMOVED]2007-05-30[HTML_REMOVED] onwards.

The file constitutes an anonymous dump of [HTML_REMOVED]who[HTML_REMOVED] posted [HTML_REMOVED]what[HTML_REMOVED] and [HTML_REMOVED]when[HTML_REMOVED] the posting took place. There is no data in this file which is not already available publicly through the web site, so there are no privacy implications for making it available. The advantage is that it's available in one file rather than having to spider the entire site to get at the information (please don't do that!).

The file is a simple unix ("n" line endings) text file with pipe ("|") delimiters. The columns are:

  • [HTML_REMOVED]The CiteULike article id which was posted[HTML_REMOVED]
  • [HTML_REMOVED]An obfuscated representation of the username (a salted MD5 hash of the true username). Again, it is possible to piece back together what the true username is by scraping the site, but I'd rather you didn't do that. The reason I've gone to the trouble of obfuscation is primarily a slightly paranoid anti-spam measure[HTML_REMOVED]
  • [HTML_REMOVED]The date and time the article was posted to the site[HTML_REMOVED]
  • [HTML_REMOVED]The tag the user used to post it[HTML_REMOVED]
  • [HTML_REMOVED]NB[HTML_REMOVED] Note that if a user posts an article with [HTML_REMOVED]n[HTML_REMOVED] tags, then this will result in [HTML_REMOVED]n[HTML_REMOVED] rows in the file

    [HTML_REMOVED]Article linkout data[HTML_REMOVED]

    Mapping CiteULike article_ids to resources on the web can be done with the linkout table. The current snapshot is available at http://static.citeulike.org/data/linkouts.bz2

    [HTML_REMOVED]

    Openness: OPEN (?)

    • License: no license specified but manner in which it is made available suggests it is open.
    • Access: good.
      • bulk: yes.

    Erőforrások

    (Nincs)

    További információ

    Mező Érték
    Forrás http://www.citeulike.org/faq/data.adp
    Szerző Author not given
    Karbantartó Maintainer not given

    Cite this

    CiteULike datasets. No author.
    Retrieved 10:48, May 21, 2013 (UTC).
    the Data Hub

    Comments