MoReBikeS: 2015 ECML-PKDD Challenge on "Model Reuse with Bike rental Station data"

Welcome to the main page of the ECML PKDD 2015 Challenge on Model Reuse with Bike rental Station data!

NEWS AND RESULTS

FINAL RESULTS ON FULL TEST DATA

Among the 23 teams of the small test data challenge 10 teams participated in the full test data challenge. The results are as follows.

Rank	Team	Team members	Mean absolute error
#1	Hao	Hao Song, Peter Flach	2.0143
#2	Yu Chen	Yu Chen, Peter Flach	2.0515
#3	Arun B S	Arun Bala Subramaniyan, Rong Pan	2.0667
#4	DCC_UFLA_15	Fernando Simeone, Diego S. Mendes, Ahmed A. A. Esmin	2.0794
#5	Farhan	Chowdhury Farhan Ahmed	2.1027
#6	Denis	Denis Moreira dos Reis	2.1725
#7	thelastone	Víctor Núñez Monsálvez	2.2098
#8	arp	Andrés Ramos, Francisco Rangel	2.3092
#9	Tom&Niall	Tom Diethe, Niall Twomey, Peter Flach	2.3921
#10	Dmlab	Gergo Barta	2.4155

Top 3 participants have received the prizes of ECML/PKDD 2015 free registration.

Full test data are available here: http://reframe-d2k.org/File:Morebikes_full_test_data.csv.zip

CHALLENGE WORKSHOP

The challenge workshop was held on the 11th Sept in Porto at ECML-PKDD 2015.

The schedule is available here: http://users.dsic.upv.es/~flip/LMCE2015/MoReBikeS_Schedule.html.

The slides of Meelis Kull presenting the summary of the challenge are available here: http://reframe-d2k.org/File:2015_09_11_morebikes_summary.pdf

CHALLENGE WORKSHOP PAPERS

Here are the challenge workshop papers from all 10 full test data challenge participants.

Team	Authors	Title
arp	Andrés Ramos, Francisco Rangel	Autoritas participation at MoReBikeS: Model Reuse with Bike rental Station data
Arun B S	Arun Bala Subramaniyan, Rong Pan	Prediction of Bike Rental using Model Reuse Strategy
DCC_UFLA_15	Fernando Simeone, Diego S. Mendes, Ahmed A. A. Esmin	Nearest-Neighbor Distance Method Applied to Model Reuse With Bike Rental Station Data
Denis	Denis Moreira dos Reis	Selecting Training Data By Evaluating Existing Models: Reusing models for the MoReBikeS Challenge]]
Dmlab	Gergo Barta	Bike sharing model reuse framework for tree-based ensembles
Farhan	Chowdhury Farhan Ahmed	Reframing Bike Challenge Problem using Model Selection
Hao	Hao Song, Peter Flach	Model Reuse with Subgroup Discovery
thelastone	Víctor Núñez Monsálvez	Reusing Models Using K-Nearest Neighbors And Weighted Arithmetic Mean To Predict Future Use Of Bike Stations For The MoReBikeS Challenge 2015
Tom&Niall	Tom Diethe, Niall Twomey, Peter Flach	Gaussian Process Model Re-Use
Yu Chen	Yu Chen, Peter Flach	SVR-based Modelling for the MoReBikeS Challenge: Analysis, Visualisation and Prediction

BEST STUDENT AWARD

MoReBikeS best student award is decided based on the final leaderboard on small test data (see below). The results are as follows:

Award	Rank among all participants	Name	University	Country
Best Student Award	1	Yu Chen	University of Bristol	UK
Best Student Runner-up	2	Victor Nuñez Monsalvez	Universitat Politècnica de València	Spain

The special prize of a free one year Cyclocity subscription goes to Victor Nuñez Monsalvez because the winner has kindly passed the prize to the runner-up due to Cyclocity not operating in the UK.

FINAL LEADERBOARD OF ALL PARTICIPANTS ON SMALL TEST DATA (June 8, 2015)

Earlier leaderboards are available at http://reframe-d2k.org/Challenge_Leaderboards

Rank	Name	Submission number	Mean absolute error
1	Yu Chen	10	2.376
2	thelastone	5	2.416
3	Hao	3	2.454
4	Arun B S	9	2.502
5	DCC_UFLA_15	1	2.532
6	lmontes	2	2.536
7	Farhan	2	2.554
8	arp	2	2.556
9	Denis	7	2.604
10	VLC8	3	2.690
11	masfworld	1	2.736
12	Dmlab	12	2.829
13	ComUnTir	1	2.922
14	Reem	2	3.068
15	Bikes 3h ago	Baseline	3.288
16	emakumea	2	3.288
17	kafka	3	3.288
18	AEslava	2	3.422
19	LMAF	2	3.422
20	GLN	4	3.458
21	BigBones	1	4.162
22	Robin	1	4.402
23	yeha	1	4.460
24	iseddel	1	4.518
25	VMM	1	4.612
26	All zeros	Baseline	7.550

FINAL LEADERBOARD OF ALL SUBMISSIONS ON SMALL TEST DATA (June 8, 2015)

Rank	Name	Submission number	Mean absolute error
1	Yu Chen	10	2.376
2	thelastone	5	2.416
3	thelastone	3	2.430
4	thelastone	6	2.430
5	thelastone	4	2.434
6	thelastone	7	2.444
7	Hao	3	2.454
8	Yu Chen	8	2.460
9	Yu Chen	7	2.461
10	Yu Chen	6	2.469
11	Hao	5	2.478
12	Hao	8	2.478
13	Hao	4	2.484
14	Hao	7	2.492
15	Hao	6	2.494
16	Yu Chen	3	2.496
17	Arun B S	9	2.502
18	Arun B S	12	2.514
19	Hao	9	2.514
20	Arun B S	8	2.519
21	Yu Chen	4	2.520
22	Yu Chen	9	2.520
23	Arun B S	11	2.526
24	Yu Chen	11	2.528
25	DCC_UFLA_15	1	2.532
26	lmontes	2	2.536
27	Hao	12	2.540
28	Farhan	2	2.554
29	arp	2	2.556
30	arp	3	2.558
31	Arun B S	10	2.560
32	Yu Chen	12	2.562
33	Hao	10	2.564
34	Farhan	1	2.572
35	Hao	11	2.590
36	Denis	7	2.604
37	Denis	4	2.608
38	Yu Chen	5	2.612
39	Yu Chen	2	2.625
40	Hao	1	2.634
41	Denis	3	2.642
42	Denis	9	2.642
43	Denis	12	2.646
44	Denis	11	2.658
45	Denis	6	2.680
46	VLC8	3	2.690
47	Hao	2	2.700
48	thelastone	2	2.722
49	Arun B S	4	2.724
50	masfworld	1	2.736
51	Denis	10	2.756
52	Denis	8	2.764
53	Denis	2	2.768
54	Arun B S	5	2.774
55	Denis	5	2.774
56	Denis	1	2.778
57	Dmlab	12	2.829
58	lmontes	1	2.912
59	thelastone	1	2.912
60	ComUnTir	1	2.922
61	Dmlab	10	2.966
62	Arun B S	6	3.068
63	Arun B S	7	3.068
64	Reem	2	3.068
65	Dmlab	5	3.099
66	Dmlab	9	3.199
67	Bikes 3h ago	Baseline	3.288
68	emakumea	2	3.288
69	kafka	3	3.288
70	masfworld	3	3.288
71	Dmlab	11	3.311
72	Dmlab	8	3.315
73	Dmlab	7	3.316
74	Dmlab	4	3.390
75	Yu Chen	1	3.414
76	AEslava	2	3.422
77	LMAF	2	3.422
78	Dmlab	3	3.451
79	GLN	4	3.458
80	Dmlab	2	3.471
81	Arun B S	3	3.556
82	Dmlab	1	3.606
83	GLN	3	3.632
84	Arun B S	1	3.636
85	Arun B S	2	3.640
86	kafka	6	3.652
87	kafka	5	3.678
88	GLN	2	3.762
89	Dmlab	6	3.791
90	AEslava	4	3.854
91	LMAF	4	3.854
92	VLC8	2	3.924
93	AEslava	3	4.014
94	LMAF	3	4.014
95	AEslava	5	4.050
96	LMAF	5	4.050
97	BigBones	1	4.162
98	Reem	1	4.332
99	Robin	1	4.402
100	kafka	1	4.412
101	yeha	1	4.460
102	GLN	1	4.468
103	iseddel	1	4.518
104	emakumea	1	4.522
105	VLC8	1	4.538
106	kafka	4	4.580
107	VMM	1	4.612
108	arp	1	4.640
109	BigBones	2	4.664
110	BigBones	3	4.674
111	AEslava	1	4.776
112	LMAF	1	4.776
113	kafka	2	4.846
114	VMM	2	5.166
115	masfworld	4	5.476
116	yeha	2	6.568
117	masfworld	2	6.898
118	All zeros	Baseline	7.550

ABOUT THE CHALLENGE

INTRODUCTION AND MOTIVATION

The task in this challenge is to predict the number of available bikes in every bike rental stations 3 hours in advance. There are at least two use cases for such predictions. First, a user plans to rent (or return) a bike in 3 hours time and wants to choose a bike station which is not empty (or full). Second, the company wants to avoid situations where a station is empty or full and therefore needs to move bikes between stations. For this purpose they need to know which stations are more likely to be empty or full soon. In both these cases the prediction can be based on what time of the day, week, or year it is and what the weather conditions are. Also, information about the current status in the station can be used. A successful predictor needs to take into account all of these aspects, as well as the profile of bike availability in this station, learned from historical information. The quality of predictions can be the better the more historical information is available. In this challenge we explore a setting where there are 200 stations which have been running for more than 2 years and 75 stations which have just been open for a month. The task is to reuse the models learned on 200 "old" stations in order to improve prediction performance on the 75 "new" stations. Hence, this challenge evaluates prediction performance on the 75 stations. If we would give full historical data about the 200 stations then we would be evaluating model building and model reuse performance at the same time. Therefore, we have decided to build models ourselves and provide the models without the full data that they have been trained on. Still, full training data for 10 stations is provided in order to facilitate the analysis about how a model can be reused in other stations and in later times. For the rest of the 190 training stations we provide only data for one month, also to help in deciding how the models can be reused.

PARTICIPATION

This challenge is open for everyone to participate by submitting predictions to the public leaderboard which is refreshed on May 4, 18, 25 and June 1. The results of the last leaderboard will be immediately published as the final results of the small test set challenge.

We encourage everyone to participate in the full test set challenge as well. For this it is required to submit the code and a paper describing the chosen prediction method by June 15 (changed from June 8) and the predictions on full test data by June 22. The main focus of the paper should be to explain the solution to other participants and interested people, comparison to other existing methods is not required. The accepted papers are presented at the challenge workshop at ECML PKDD 2015 on September 11, 2015. The winner of the MoReBikeS challenge is the presenting author with lowest mean absolute error predictions on the full test data.

TASK

The task is to predict the number of bikes in the stations 3 hours in advance.

DESCRIPTION OF DATA AND MODELS

The challenge is to reuse the models learned in 200 training stations (numbered from 1 to 200) for 75 deployment stations (numbered from 201 to 275). The linear models have been trained on the data of the training stations from the period June 2012-September 2014. The deployment data covers all the 275 stations and is about October 2014. The test data is about 75 test stations from the period November 2014-January 2015. Test data for the leaderboard is about 25 test stations from the period November 2014-December 2014. Full test data about 50 other test stations from the period November 2014-January 2015 is given to participants after paper submission. The training and deployment datasets cover all hours of the respective periods, however some timepoints have some missing values, also in the target variable. All the data and models together with detailed information are available here: http://reframe-d2k.org/Challenge_Download.

CHALLENGE TIMELINE (UPDATED!)

March 31, 2015: Training and deployment data, linear models, and leaderboard test data on-line
May 4, 18, 25 and June 1, 2015: Leaderboard refreshed for submissions up to that time
June 8, 2015 (NEW!): Final leaderboard refreshed for submissions up to that time
June 15, 2015 (extended from June 8): Deadline to submit paper and source code
June 16, 2015 (extended from June 9): Full test data available
June 22, 2015: Deadline to submit predictions on the full test set
July 6, 2015: Notification of acceptance
August 3, 2015: Deadline to submit camera-ready version
September 11, 2015: Challenge workshop at ECML PKDD 2015, Final results announced

All deadlines are 11:59pm in the latest timezone (American Samoa).

SUBMISSIONS (UPDATED!)

A leaderboard submission is a single CSV file with 3 columns: station number, timestamp, and the predicted number of bikes, see the file example_leaderboard_submission.csv at http://reframe-d2k.org/Challenge_Download. This file has to be sent by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge leaderboard submission <1-12> from <Your Name>'. Each participant can submit up to 12 files (increased from 10 files) before or on June 8 (extended from June 1). The submissions after the 12th are ignored. On May 4, 18, 25, June 1 and June 8 all leaderboard submissions are evaluated for mean absolute error and the results are published on this site, together with the participant's name and submission number (1-12).
Paper in PDF format should be uploaded to Easychair https://easychair.org/conferences/?conf=morebikes2015 and source code sent as a single compressed file by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge source code from <Your Name>'.
Full test prediction submission is a single CSV file, formatted the same as leaderboard submission and submitted to the same e-mail address with the subject 'Challenge full test submission from <Your Name>'.

EVALUATION AND RULES

The predictions are evaluated according to the mean absolute error between the predicted and true values. The winner is the participant who submitted the predictions with the lowest mean absolute error. In case of tie, the approach (generality, efficiency) will be take into account. All predictions have to be programmatically generated (not manually entered). The prediction for each test time-point is allowed to use only the given features of this instance in the test dataset (NOT THE FEATURES OF THE OTHER TEST TIME-POINTS) and all provided training and deployment data and the models. Other data sources are not allowed. The prediction on the full test data set must be obtained by running the submitted code without any changes and without any parameters other than the test file name.

PRIZE

Three participants who provided the best predictions on the full test set are awarded one free registration to the ECML-PKDD 2015 conference each. CATEDRA INNDEA, ECML-PKDD and REFRAME sponsor these free registrations (at the early rate).

A special prize is awarded to the best student in the final leaderboard : a free one year subscription in one city having self-service bicycles operated by Cyclocity http://en.cyclocity.com/Cities/Cyclocity-in-the-world.

ORGANISING COMMITTEE

Nicolas Lachiche, University of Strasbourg, France (nicolas DOT lachiche AT unistra DOT fr)
Meelis Kull, University of Bristol, UK (meelis DOT kull AT bristol DOT ac DOT uk)
Adolfo Martínez-Usó, Universitat Politècnica de Valencia, Spain (admarus AT upv DOT es)

ACKNOWLEDGEMENTS

Organising committee would like to thank to Altocumulo weather station for their help in collecting the weather information.

Challenge

Contents