Distilling the Knowledge in a Neural Network¶

1. Basic Information¶

2. Summary¶

2.1. Key Idea¶

  • Can use unlabeled data to transfer knowledge, but using the same training data seems to work well in practice.
  • Use softmax with temperature, values from 1-10 seem to work well, depending on the problem.
  • The MNIST networks learn to recognize digits without ever having seen base, solely based on the "errors" that the teacher network makes. (Bias needs to be adjusted)

2.2. Contributions¶

  • Training on soft targets with less data performs much better than training on hard targets with same amount of data.

3. Explanation¶

  • PR12에서 정영재님이 멋지게 발표해주신 내용이 있으니 이걸 참고하도록 합시다.
In [2]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/tOItokBZSfU?list=PLlMkM4tgfjnJhhd4wn5aj8fVTYJwIpWkS" frameborder="0" allowfullscreen></iframe>')
Out[2]:

4. TensorFlow Implementation¶

무엇을 만들어야 하는가? *

In [ ]: