Distilling the Knowledge in a Neural NetworkÂ¶

1. Basic InformationÂ¶

Authors: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
Paper status: NIPS 2014 Deep Learning Workshop
Link: https://arxiv.org/abs/1503.02531

2. SummaryÂ¶

2.1. Key IdeaÂ¶

Can use unlabeled data to transfer knowledge, but using the same training data seems to work well in practice.
Use softmax with temperature, values from 1-10 seem to work well, depending on the problem.
The MNIST networks learn to recognize digits without ever having seen base, solely based on the "errors" that the teacher network makes. (Bias needs to be adjusted)

2.2. ContributionsÂ¶

Training on soft targets with less data performs much better than training on hard targets with same amount of data.

3. ExplanationÂ¶

PR12에서 정영재님이 멋지게 발표해주신 내용이 있으니 이걸 참고하도록 합시다.

In [2]:

from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/tOItokBZSfU?list=PLlMkM4tgfjnJhhd4wn5aj8fVTYJwIpWkS" frameborder="0" allowfullscreen></iframe>')

Out[2]:

4. TensorFlow ImplementationÂ¶

무엇을 만들어야 하는가? *

In [ ]: