11 Messages
•
190 Points
It might be time to consider adding missing data to IMDb's non-commercial use download datasets.
The number of entries that are missing from the IMDb downloadable datasets seems to have increased substantially over the last few years (by my quick estimate, about half of IMDb's titles - which assumes a third of the tconst numbers are bad/deleted). As far as I can tell, this is because two categories (Music Video and Podcast) are apparently excluded from the extraction process. Although the provided data is somewhat minimal in some cases, the relatively recent online page re-designs/removals have made the need for this data far more important to some users of IMDb data in a non-online context (i.e., personally, I keep an offline database of my video collection, the many differences between it and IMDb data and potential titles I'd like to add to my collection). With it now more difficult to retrieve things like parentTconst, correct date for a series episode, etc., from online filmography pages, etc., the provided data becomes more important.
I'm not personally familiar enough with the other tsv datasets to know how much impact adding the two categories would require with them but I'd really like to see the relatively small Music Video data added to the file title.basics and rather large Podcast data added to both title.basics and title.episode. If IMDb doesn't want to double the size of these two files, perhaps two new files in the same column format (podcast.title.basics and podcast.title.episode) would work.
No Responses!