Challenge
MoReBikeS: 2015 ECML-PKDD Challenge on "Model Reuse with Bike rental Station data"
Welcome to the main page of the ECML PKDD 2015 Challenge on Model Reuse with Bike rental Station data!
NEWS AND RESULTS
FINAL RESULTS ON FULL TEST DATA
Among the 23 teams of the small test data challenge 10 teams participated in the full test data challenge. The results are as follows.
Rank | Team | Team members | Mean absolute error |
---|---|---|---|
#1 | Hao | Hao Song, Peter Flach | 2.0143 |
#2 | Yu Chen | Yu Chen, Peter Flach | 2.0515 |
#3 | Arun B S | Arun Bala Subramaniyan, Rong Pan | 2.0667 |
#4 | DCC_UFLA_15 | Fernando Simeone, Diego S. Mendes, Ahmed A. A. Esmin | 2.0794 |
#5 | Farhan | Chowdhury Farhan Ahmed | 2.1027 |
#6 | Denis | Denis Moreira dos Reis | 2.1725 |
#7 | thelastone | Víctor Núñez Monsálvez | 2.2098 |
#8 | arp | Andrés Ramos, Francisco Rangel | 2.3092 |
#9 | Tom&Niall | Tom Diethe, Niall Twomey, Peter Flach | 2.3921 |
#10 | Dmlab | Gergo Barta | 2.4155 |
Top 3 participants have received the prizes of ECML/PKDD 2015 free registration.
Full test data are available here: http://reframe-d2k.org/File:Morebikes_full_test_data.csv.zip
CHALLENGE WORKSHOP
The challenge workshop was held on the 11th Sept in Porto at ECML-PKDD 2015.
The schedule is available here: http://users.dsic.upv.es/~flip/LMCE2015/MoReBikeS_Schedule.html.
The slides of Meelis Kull presenting the summary of the challenge are available here: http://reframe-d2k.org/File:2015_09_11_morebikes_summary.pdf
CHALLENGE WORKSHOP PAPERS
Here are the challenge workshop papers from all 10 full test data challenge participants.
BEST STUDENT AWARD
MoReBikeS best student award is decided based on the final leaderboard on small test data (see below). The results are as follows:
Award | Rank among all participants | Name | University | Country |
---|---|---|---|---|
Best Student Award | 1 | Yu Chen | University of Bristol | UK |
Best Student Runner-up | 2 | Victor Nuñez Monsalvez | Universitat Politècnica de València | Spain |
The special prize of a free one year Cyclocity subscription goes to Victor Nuñez Monsalvez because the winner has kindly passed the prize to the runner-up due to Cyclocity not operating in the UK.
FINAL LEADERBOARD OF ALL PARTICIPANTS ON SMALL TEST DATA (June 8, 2015)
Earlier leaderboards are available at http://reframe-d2k.org/Challenge_Leaderboards
Rank | Name | Submission number | Mean absolute error |
---|---|---|---|
1 | Yu Chen | 10 | 2.376 |
2 | thelastone | 5 | 2.416 |
3 | Hao | 3 | 2.454 |
4 | Arun B S | 9 | 2.502 |
5 | DCC_UFLA_15 | 1 | 2.532 |
6 | lmontes | 2 | 2.536 |
7 | Farhan | 2 | 2.554 |
8 | arp | 2 | 2.556 |
9 | Denis | 7 | 2.604 |
10 | VLC8 | 3 | 2.690 |
11 | masfworld | 1 | 2.736 |
12 | Dmlab | 12 | 2.829 |
13 | ComUnTir | 1 | 2.922 |
14 | Reem | 2 | 3.068 |
15 | Bikes 3h ago | Baseline | 3.288 |
16 | emakumea | 2 | 3.288 |
17 | kafka | 3 | 3.288 |
18 | AEslava | 2 | 3.422 |
19 | LMAF | 2 | 3.422 |
20 | GLN | 4 | 3.458 |
21 | BigBones | 1 | 4.162 |
22 | Robin | 1 | 4.402 |
23 | yeha | 1 | 4.460 |
24 | iseddel | 1 | 4.518 |
25 | VMM | 1 | 4.612 |
26 | All zeros | Baseline | 7.550 |
FINAL LEADERBOARD OF ALL SUBMISSIONS ON SMALL TEST DATA (June 8, 2015)
Rank | Name | Submission number | Mean absolute error |
---|---|---|---|
1 | Yu Chen | 10 | 2.376 |
2 | thelastone | 5 | 2.416 |
3 | thelastone | 3 | 2.430 |
4 | thelastone | 6 | 2.430 |
5 | thelastone | 4 | 2.434 |
6 | thelastone | 7 | 2.444 |
7 | Hao | 3 | 2.454 |
8 | Yu Chen | 8 | 2.460 |
9 | Yu Chen | 7 | 2.461 |
10 | Yu Chen | 6 | 2.469 |
11 | Hao | 5 | 2.478 |
12 | Hao | 8 | 2.478 |
13 | Hao | 4 | 2.484 |
14 | Hao | 7 | 2.492 |
15 | Hao | 6 | 2.494 |
16 | Yu Chen | 3 | 2.496 |
17 | Arun B S | 9 | 2.502 |
18 | Arun B S | 12 | 2.514 |
19 | Hao | 9 | 2.514 |
20 | Arun B S | 8 | 2.519 |
21 | Yu Chen | 4 | 2.520 |
22 | Yu Chen | 9 | 2.520 |
23 | Arun B S | 11 | 2.526 |
24 | Yu Chen | 11 | 2.528 |
25 | DCC_UFLA_15 | 1 | 2.532 |
26 | lmontes | 2 | 2.536 |
27 | Hao | 12 | 2.540 |
28 | Farhan | 2 | 2.554 |
29 | arp | 2 | 2.556 |
30 | arp | 3 | 2.558 |
31 | Arun B S | 10 | 2.560 |
32 | Yu Chen | 12 | 2.562 |
33 | Hao | 10 | 2.564 |
34 | Farhan | 1 | 2.572 |
35 | Hao | 11 | 2.590 |
36 | Denis | 7 | 2.604 |
37 | Denis | 4 | 2.608 |
38 | Yu Chen | 5 | 2.612 |
39 | Yu Chen | 2 | 2.625 |
40 | Hao | 1 | 2.634 |
41 | Denis | 3 | 2.642 |
42 | Denis | 9 | 2.642 |
43 | Denis | 12 | 2.646 |
44 | Denis | 11 | 2.658 |
45 | Denis | 6 | 2.680 |
46 | VLC8 | 3 | 2.690 |
47 | Hao | 2 | 2.700 |
48 | thelastone | 2 | 2.722 |
49 | Arun B S | 4 | 2.724 |
50 | masfworld | 1 | 2.736 |
51 | Denis | 10 | 2.756 |
52 | Denis | 8 | 2.764 |
53 | Denis | 2 | 2.768 |
54 | Arun B S | 5 | 2.774 |
55 | Denis | 5 | 2.774 |
56 | Denis | 1 | 2.778 |
57 | Dmlab | 12 | 2.829 |
58 | lmontes | 1 | 2.912 |
59 | thelastone | 1 | 2.912 |
60 | ComUnTir | 1 | 2.922 |
61 | Dmlab | 10 | 2.966 |
62 | Arun B S | 6 | 3.068 |
63 | Arun B S | 7 | 3.068 |
64 | Reem | 2 | 3.068 |
65 | Dmlab | 5 | 3.099 |
66 | Dmlab | 9 | 3.199 |
67 | Bikes 3h ago | Baseline | 3.288 |
68 | emakumea | 2 | 3.288 |
69 | kafka | 3 | 3.288 |
70 | masfworld | 3 | 3.288 |
71 | Dmlab | 11 | 3.311 |
72 | Dmlab | 8 | 3.315 |
73 | Dmlab | 7 | 3.316 |
74 | Dmlab | 4 | 3.390 |
75 | Yu Chen | 1 | 3.414 |
76 | AEslava | 2 | 3.422 |
77 | LMAF | 2 | 3.422 |
78 | Dmlab | 3 | 3.451 |
79 | GLN | 4 | 3.458 |
80 | Dmlab | 2 | 3.471 |
81 | Arun B S | 3 | 3.556 |
82 | Dmlab | 1 | 3.606 |
83 | GLN | 3 | 3.632 |
84 | Arun B S | 1 | 3.636 |
85 | Arun B S | 2 | 3.640 |
86 | kafka | 6 | 3.652 |
87 | kafka | 5 | 3.678 |
88 | GLN | 2 | 3.762 |
89 | Dmlab | 6 | 3.791 |
90 | AEslava | 4 | 3.854 |
91 | LMAF | 4 | 3.854 |
92 | VLC8 | 2 | 3.924 |
93 | AEslava | 3 | 4.014 |
94 | LMAF | 3 | 4.014 |
95 | AEslava | 5 | 4.050 |
96 | LMAF | 5 | 4.050 |
97 | BigBones | 1 | 4.162 |
98 | Reem | 1 | 4.332 |
99 | Robin | 1 | 4.402 |
100 | kafka | 1 | 4.412 |
101 | yeha | 1 | 4.460 |
102 | GLN | 1 | 4.468 |
103 | iseddel | 1 | 4.518 |
104 | emakumea | 1 | 4.522 |
105 | VLC8 | 1 | 4.538 |
106 | kafka | 4 | 4.580 |
107 | VMM | 1 | 4.612 |
108 | arp | 1 | 4.640 |
109 | BigBones | 2 | 4.664 |
110 | BigBones | 3 | 4.674 |
111 | AEslava | 1 | 4.776 |
112 | LMAF | 1 | 4.776 |
113 | kafka | 2 | 4.846 |
114 | VMM | 2 | 5.166 |
115 | masfworld | 4 | 5.476 |
116 | yeha | 2 | 6.568 |
117 | masfworld | 2 | 6.898 |
118 | All zeros | Baseline | 7.550 |
ABOUT THE CHALLENGE
INTRODUCTION AND MOTIVATION
The task in this challenge is to predict the number of available bikes in every bike rental stations 3 hours in advance. There are at least two use cases for such predictions. First, a user plans to rent (or return) a bike in 3 hours time and wants to choose a bike station which is not empty (or full). Second, the company wants to avoid situations where a station is empty or full and therefore needs to move bikes between stations. For this purpose they need to know which stations are more likely to be empty or full soon. In both these cases the prediction can be based on what time of the day, week, or year it is and what the weather conditions are. Also, information about the current status in the station can be used. A successful predictor needs to take into account all of these aspects, as well as the profile of bike availability in this station, learned from historical information. The quality of predictions can be the better the more historical information is available. In this challenge we explore a setting where there are 200 stations which have been running for more than 2 years and 75 stations which have just been open for a month. The task is to reuse the models learned on 200 "old" stations in order to improve prediction performance on the 75 "new" stations. Hence, this challenge evaluates prediction performance on the 75 stations. If we would give full historical data about the 200 stations then we would be evaluating model building and model reuse performance at the same time. Therefore, we have decided to build models ourselves and provide the models without the full data that they have been trained on. Still, full training data for 10 stations is provided in order to facilitate the analysis about how a model can be reused in other stations and in later times. For the rest of the 190 training stations we provide only data for one month, also to help in deciding how the models can be reused.
PARTICIPATION
This challenge is open for everyone to participate by submitting predictions to the public leaderboard which is refreshed on May 4, 18, 25 and June 1. The results of the last leaderboard will be immediately published as the final results of the small test set challenge.
We encourage everyone to participate in the full test set challenge as well. For this it is required to submit the code and a paper describing the chosen prediction method by June 15 (changed from June 8) and the predictions on full test data by June 22. The main focus of the paper should be to explain the solution to other participants and interested people, comparison to other existing methods is not required. The accepted papers are presented at the challenge workshop at ECML PKDD 2015 on September 11, 2015. The winner of the MoReBikeS challenge is the presenting author with lowest mean absolute error predictions on the full test data.
TASK
The task is to predict the number of bikes in the stations 3 hours in advance.
DESCRIPTION OF DATA AND MODELS
The challenge is to reuse the models learned in 200 training stations (numbered from 1 to 200) for 75 deployment stations (numbered from 201 to 275). The linear models have been trained on the data of the training stations from the period June 2012-September 2014. The deployment data covers all the 275 stations and is about October 2014. The test data is about 75 test stations from the period November 2014-January 2015. Test data for the leaderboard is about 25 test stations from the period November 2014-December 2014. Full test data about 50 other test stations from the period November 2014-January 2015 is given to participants after paper submission. The training and deployment datasets cover all hours of the respective periods, however some timepoints have some missing values, also in the target variable. All the data and models together with detailed information are available here: http://reframe-d2k.org/Challenge_Download.
CHALLENGE TIMELINE (UPDATED!)
- March 31, 2015: Training and deployment data, linear models, and leaderboard test data on-line
- May 4, 18, 25 and June 1, 2015: Leaderboard refreshed for submissions up to that time
- June 8, 2015 (NEW!): Final leaderboard refreshed for submissions up to that time
- June 15, 2015 (extended from June 8): Deadline to submit paper and source code
- June 16, 2015 (extended from June 9): Full test data available
- June 22, 2015: Deadline to submit predictions on the full test set
- July 6, 2015: Notification of acceptance
- August 3, 2015: Deadline to submit camera-ready version
- September 11, 2015: Challenge workshop at ECML PKDD 2015, Final results announced
All deadlines are 11:59pm in the latest timezone (American Samoa).
SUBMISSIONS (UPDATED!)
- A leaderboard submission is a single CSV file with 3 columns: station number, timestamp, and the predicted number of bikes, see the file example_leaderboard_submission.csv at http://reframe-d2k.org/Challenge_Download. This file has to be sent by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge leaderboard submission <1-12> from <Your Name>'. Each participant can submit up to 12 files (increased from 10 files) before or on June 8 (extended from June 1). The submissions after the 12th are ignored. On May 4, 18, 25, June 1 and June 8 all leaderboard submissions are evaluated for mean absolute error and the results are published on this site, together with the participant's name and submission number (1-12).
- Paper in PDF format should be uploaded to Easychair https://easychair.org/conferences/?conf=morebikes2015 and source code sent as a single compressed file by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge source code from <Your Name>'.
- Full test prediction submission is a single CSV file, formatted the same as leaderboard submission and submitted to the same e-mail address with the subject 'Challenge full test submission from <Your Name>'.
EVALUATION AND RULES
The predictions are evaluated according to the mean absolute error between the predicted and true values. The winner is the participant who submitted the predictions with the lowest mean absolute error. In case of tie, the approach (generality, efficiency) will be take into account. All predictions have to be programmatically generated (not manually entered). The prediction for each test time-point is allowed to use only the given features of this instance in the test dataset (NOT THE FEATURES OF THE OTHER TEST TIME-POINTS) and all provided training and deployment data and the models. Other data sources are not allowed. The prediction on the full test data set must be obtained by running the submitted code without any changes and without any parameters other than the test file name.
PRIZE
Three participants who provided the best predictions on the full test set are awarded one free registration to the ECML-PKDD 2015 conference each. CATEDRA INNDEA, ECML-PKDD and REFRAME sponsor these free registrations (at the early rate).
A special prize is awarded to the best student in the final leaderboard : a free one year subscription in one city having self-service bicycles operated by Cyclocity http://en.cyclocity.com/Cities/Cyclocity-in-the-world.
ORGANISING COMMITTEE
- Nicolas Lachiche, University of Strasbourg, France (nicolas DOT lachiche AT unistra DOT fr)
- Meelis Kull, University of Bristol, UK (meelis DOT kull AT bristol DOT ac DOT uk)
- Adolfo Martínez-Usó, Universitat Politècnica de Valencia, Spain (admarus AT upv DOT es)
ACKNOWLEDGEMENTS
Organising committee would like to thank to Altocumulo weather station for their help in collecting the weather information.