logo

ACM RecSys Challenge 2017

About

The ACM RecSys Challenge 2017 is focussing on the problem of job recommendations on XING in a cold-start scenario. The challenge will consists of two phases:

  1. offline evaluation: fixed historic dataset and fixed targets for which recommendations/solutions need to be computed/submitted.
  2. online evaluation: dynamically changing targets. Recommendations submitted by the teams will actually be rolled out in XING's live system (details about the online evaluation such as the approach for ensuring fair conditions between the teams will follow).

Both phases aim at the following task:

Task: given a new job posting p, the goal is to identify those users that (a) may be interested in receiving the job posting as a push recommendation and (b) that are also appropriate candidates for the given job.

For both offline and online evaluation, the same evaluation metrics and the same types of data sets will be used. The offline evaluation is essentially used as an entry gate to the online evaluation:

Recommendation Scenario top

The online evaluation focus on a push recommendation scenario in which new items (job postings) are given and users need to be identified...

  1. who are interested in job postings in general (e.g. open to new job offers, willing to change their job)
  2. who are interested in the particular job posting which they are notified about
  3. who are an appropriate candidate for the given job posting (e.g. recruiters who own the job postings indicate that they are interested in the candidate)

In the online challenge, teams will only submit their best user for an item to the system. For each target item users are allowed to submit one or more target users. However, each user can only be submitted once. Since push recommendations are presented to the users in a more prominent way, we decided on this restriction. These recommendations are then played out to the user over the following channels.

  • Channels: given the list of recommendations such as (p1, u42), (p1, u23), ... where pi is the i-th target posting and uj is the j-th target user, the recommendations are delivered to users through the following channels:
    • activity stream: "Vacancies matching your profile" story in the stream on xing.com and in the mobile apps. (see screenshot)
    • jobs marketplace: an orange notification bubble in the side-bar and an orange label "new" highlights the new job recommendation, e.g. on xing.com/jobs or in the mobile apps (see: screenshot)
    • emails: if the user did not see the push recommendation then the user may receive an email that points him/her to the job recommendation (see screenshot)
    • recruiter tools: users which receive a job posting as push recommendation are also likely to appear as candidate recommendations to recruiters, for example, in the so-called XING talent manager (see screenshot)
  • Challenges top

    Some challenges that the participating teams will need to solve:

    Evaluation Metrics top

    Given a list of target items targetItems, for which the recommender selects those users to whom item in T, is pushed as recommendation, we compute the the leaderboard score as follows:

    score(targetItems) = targetItems.map(item => score(item, recommendations(item))).sum

    Here, recommendations(item) specifies the list of users who will receive the given item as push recommendation. The function score(item, users) is defined as follows:

    score(item, users) = 
      users.map(u => userSuccess(item,u)).sum + itemSuccess(item, users)
      
      userSucess(item, user) = 
        (
            if (clicked) 1 else 0 
          + if (bookmarked || replied) 5 else 0 
          + if (recruiter interest) 20 else 0 
          - if (delete only) 10 else 0 
        ) * premiumBoost(user)
    
      premiumBoost(user) = if (user.isPremium) 2 else 1 
          
      itemSuccess(item, users) = 
        if (users.filter(u => userSuccess(item, u) > 0).size >= 1) {
          if (item.isPaid) 50
          else 25
        } else 0 
    

    Meaning:

    Purpose of evaluation metrics:

    The above evaluation metrics will be applied for both offline evaluation and online evaluation (in the offline evaluation, the target items won't change during the challenge while in the online evaluation, new target items are relased on a daily basis).

    Dataset top

    The training dataset is supposed to be used for experimenting and training your models. You can split the interaction data into training and test data. For example: you can leave out the last complete week from the interaction data and then try to predict whether for a given job posting, you can predict the users that will positively interact with the posting.

    Anonymization, pseudonymization, noise top

    The training dataset is a semi-synthetic sample of XING's dataset, i.e. it is not complete and enriched with noise in order anonymize the data. For example:

    Attempting to identify users or to reveal any private information about the users or information about the business from which the data is coming from is strictly forbidden (cf. Rules).

    Interactions top

    interactions.csv: Interactions are all transactions between a user and an item including recruiter interests as well as impressions. Fields:

    Users top

    users.csv: Details about those users who appear in the above datasets. Fields:

    Items top

    items.csv: Details about the job postings that were and should be recommended to the users.

    Targets top

    The dataset contains two additional files that contain target item IDs and target user IDs:

    Note: solutions that are submitted are only allowed to conatin items and users from the above files.

    Baseline top

    The baseline is using xgboost and is solely content-based. Details about the baseline and Python code are available at: github.com/recsyschallenge/2017/baseline/

    Participation top

    For participating in the challenge, you will need to...

    Leaderboard top

    The public leaderboard is based on a 30% random sample of the entire ground truth. See: recsys.xing.com/leaders

    Rules top

    Data

    Datasets that are released as part of the RecSys challenge are semi-synthetic, non-complete samples, i.e. XING data is sampled and enriched with noise. Regarding the released datasets, participants have to stick to the following rules:

    1. Attempting to identify users or to reveal any private information about the users or information about the business from which the data is coming from is strictly forbidden.
    2. It is strictly forbidden to share the datasets with others.
    3. It is not allowed to use the data for commercial purposes.
    4. The data may only be used for academic purposes.
    5. It is not allowed to use the data in any way that harms XING or XING's customers.

    Licence of the data: All rights are reserved by XING AG.

    Final Paper

    Each team should submit a paper describing the algorithms that they developed for the task (see paper submissions & workshop). Teams without a paper submission to the RecSys Challenge workshop will be removed from the final leaderboard.

    No Crawling on XING

    It is not allowed to crawl additional information from XING (e.g. via XING's APIs or by scraping details from XING pages).

    L’esprit sportif / Fair-play

    Please stick to the rules above, only sign-up for one team and stick to the submission limits: you can upload at maximum 20 solutions per day (for the offline challenge). We may suspend a team from the challenge if we get the impression that the team is not playing fair.

    Ask us

    If you are unsure of whether something is allowed or not, contact us (e.g. create an issue on github) and we will be happy to help you. Above all remember it's all for science, so be creative, not evil!

    Questions top

    Questions and remarks about the procedure and other aspects concerning the challenge can be submitted as github issues.

    Prizes top

    Prizes are given out to the teams that achieved the highest scores at the end of the online evaluation:

    In order to get the prize money, teams have to describe their algorithms in an accompanying paper and present it during the RecSys Challenge workshop in Como, Italy.

    Timeline top

    When? What?
    Beginning of March RecSys challenge starts:
    • RecSys challenge starts with offline evaluation
    • Submission system will be available via recsys.xing.com (currently offline)
    • Maximum number of submissions per day: 20
    April 16th (23:59 Hawaiian time) Offline evaluation ends:
    • Based on the overall leaderboard of the offline challenge, the top teams (probably top 20) are asked to join the online challenge
    • Top teams get access to API of the online challenge.
    • The submission of the offline challenge will continue to be open
    • After this deadline there will be a possibility to join the online challenge for teams who achieve awesome scores
    May 1st Online challenge starts:
    • Every day, new target items will be released for which the teams are supposed to compute recommendations, i.e. identify users that may be interested in these items.
    • Teams download the new target list via API, compute recommendations and submit their solutions via API.
    Teams can script their recommender systems to regulalry pull from the API to check for updates.
    June 4th (23:59 Hawaiian time) Online evaluation ends:
    • Last day of online evaluation
    • Submission system closes
    June 12th
    • Official results will be announced
    • Winner of the challenge = winner of the online challenge
    June 18th Paper submission deadline for RecSys Challenge workshop
    July 3rd Notifications about paper acceptance
    July 17th Deadline for camera-ready papers
    August 27th-31st Workshop will take place as part of the RecSys conference in Como, Italy.

    Paper Submissions & Workshop top

    Each team - not only the top teams - should submit a paper that describes the algorithms that they used for solving the challenge. Those papers will be reviewed by the program committee (non-blind double review). At least one of the authors is expected to register for the RecSys Challenge workshop which will take place as part of the RecSys conference in Como, Italy.

    Format top

    Papers should not exceed 4-6 pages. They have to be uploaded as PDF and have to be prepared according to the standard ACM SIG proceedings format: templates.

    Upload Paper top

    Papers (and later also the camera-ready versions) have to be uploaded via EasyChair: submit paper via EasyChair

    Publication top

    We aim to publish the accepted papers in a special volume of ACM Sig Proceedings dedicated for the challenge (cf. Proceedings of the last year: ACM, DBLP).

    Program Committee top