There may be duplicates because some domains are published in multiple logs. I get data from all of the logs currently included with Chrome, a list of which is available here: https://github.com/google/certificate-transparency-community....