Machine Learning and Data Mining in Practice
Table of Contents
The major purpose of the course is to attend KDDCup. It is a well-known data mining competition held in conjunction with KDD-2012, the premier conference on data mining and knowledge discovery. In the competition, you are expected to get your hands dirty and do data mining on some real world large data sets. Last year, it had two tracks, the first aimed at predicting scores that users gave to various items, while the second track requiring separation of loved songs from other songs. Last year, the SJTU-HKUST team won the 3rd place in KDD Cup 2011. Last year, the competition should start in March and end in June 30.
You will learn how to apply data mining and machine learning techniques to real world problems. You will cooperate with your teammates, learn new algorithms and techniques, implement them and test them on the data sets. We hope this will provide an alternative to the Fatworm course projects(students attending this course will no longer need to work on Fatworm) for those students interested in related topics.
Note that we expect you already know something about data mining and machine learning and data mining skills. The course will be an intensive one, so we will expect you to spend on average more than 10 hours each week, and each team will need to present their ideas to others every week.
To effectively make use of the collective intelligence and make us more competitive in KDD Cup, we will
- Build a private wiki to share the knowledge base (related papers and software).
- Build a platform and code framework to develop and test algorithms. Each of the students will be assigned a seat in APEXLab, with appropriate support of computation resources.
- Form independent teams. Independent teams will help discover different algorithms and achieve better performance in the end. Each team will have 3 members (as in Fatworm projects) with 1 team leader. We will merge all the teams and work closely together during the last phase of contest, our final goal is to win KDDCup as ONE team. The general rules for submission will given by the TFs after the contest starts.
- Hold regular meetings. One of our ultimate goals is to win in the KDD Cup, so we will frequently share experience between teams. We will hold regular meetings every week, and each team will report their progress at the meetings. It is OK not to use PPT in such meetings.
- We provide the course for those who have strong interest to work in ML and DM. We shall emphasize that this course will require much more efforts than Fatworm.
- Note we will compete with researchers on ML and DM all over the world instead of undergraduate students like we are. So this is really a good chance to see how much we can push our limit to.
- Start preparation early. Since we are undergrads who have limited experience in DM and ML, early preparation is very necessary.
The final score is related to your contribution to this course, not just the performance of your code. If you implement a model that is not strong itself but helps other models achieve better results, it is just wonderful. If you devise an algorithm that others find to be effective, you will also receive credits. The final grades will be given by TFs of the course. Note that since we expect to meet regularly and work together.
The TFs will be very familiar with your contribution and the final grades will be given by teaching fellows.
- SJTU's KDD Cup 2011 project page.
- National Taiwan University’s course page. NTU has a course page in the same spirits of this document. Please read it carefully if you are still not surew what we are doing here.
- Stanford's machine learning class.
- The course will be limited to 12 students at most.
- Please send a email to Tianqi Chen for application with a short statement of your purpose and related background before Jan. 15th.
- We emphasize again this course is going to be more intensive and challenging, and require much more effort than Fatworm project.
Tianqi Chen: tqchen [at] apex [点] sjtu [点] edu [点] cn
Linpeng Tang: chnttlp [at] gmail [点] com