Aoimirai - Kpop tools Full Disclaimers

As with any data mining, aggregation and statistics, there are plenty of disclaimers on how data are acquired and treated. As much I would like to leave every single detail available as soon as possible,it would make the side pretty unreadable with so many fine print and disclaimers everywhere, so this page contains all of them.

Artist database

Artists are added manually as they become famous or someone report a missing one. General data is gathered from wikipedia and fansites. If you find anything wrong with an artist data, such as name, debut video, social medias and so on, mail me. I do not monitor new profiles and accounts for artists so with time I will mostly have outdated information about their social medias.

Some small (usually only one MV) sub-units or soloists are not added and instead are listed inside the main group for simplicity sake. The biggest example of such move is with LOOΠΔ and the pre-debut sub-units.

As a side note, while we do not allow videos not hosted by official channels, some debut videos that were releases prior to Korean studios posting their releases in Youtube, and which were never officially posted, might be linked to a non official account if that is the only available video. The Debut system also uses a date correction in case the date of the debut video is not the release date (it was posted later, therefore the date in Youtube is not when the video was originally releases)

Youtube data (views and likes)

Videos are added manually as they are released, a new artist is added to the database, or reported as missing. Since I maintain this site alone, new releases might take a few days to be registered. Also please allow a few days for me to catch up on reports and mails if you send one.

Views are gathered from the main cron bot, which cralws Youtube gathering views and likes almost every minute. Since there are about 3000 videos to fetch data, and the bot averages 50~70 crawls per hour, it can take up to 3 days to complete updating every video before the cycle starts again - however, priority videos (big views, recent releases, high view/day) get updated more frequently (some daily, recent ones each 12 hours) causing the time to update every video to increase to 4 or 5 days.

At the first day of the month, the bot prioritizes videos and artists on the top lists to get the most accurate top list around 18:00 UTC for the Top History page.

Videos that are deleted get flagged and are never updated again - I should remind that some times a video is temporarily disabled but returns later, and since I cannot check that, there is a small possibility that a video marked as removed ends up returning.

Note that the bot only reads the HTML file from Youtube, not even touching media files and therefore not even starting a view.

Follower data

A second bot runs each 30 minutes to gather followers. It checks Facebook, Twitter, Youtube channels, Youtube users, vLive profile and Instagram and detects the followers on each platform for each artist (each run, it checks all those sites for an unique artist). Since we have about 300 artists, it can take up to a week for all follower data to be updated. Profiles that are closed are removed from the update cycle.

Please notice that Facebook rounds the number of followers to the nearest thousand.

Also notice I do not follow up when new accounts for artists are created and therefore might be missing some. The only trustworthy are vLive, which are unique accounts per artists and don't change.

Sales

Sales are calculated as a sum of data (hard to find) prior to 2011, and data starting 2011 when GAON started. 

For sales PRIOR to 2011, dubbed the MIAK era (Music Industry Association of Korea), data are hard to find and usually requires digging into archives and fan-pages. Fans often update Wikipedia with some data but few artist pages are fully accurate and updated. The initial MIAK data was gathered from Wikipedia, with other sources used for some artists depending on how readily available and trustworthy they are. While for the GAON period no HANTEO data is used, there is a big chance some of the MIAK era data comes from HANTEO. Whenever possible, how MIAK era data were gathered is displayed when you click "Sales composition" in an artist.

GAON is a different method than HANTEO, whereas GAON use physical copies shipped from distribution lines, not sales to end users, HANTEO use real-time reports from stores about sales to end users. For instance, when a new debut happens, the studio will order a certain number of physical copies to be distributed to stores. Stores will then purchase these as they see fit (might not purchase all printed copies). GAON adds the copies that were shipped to stores, regardless if they end up being sold or not. Because of this, as time goes by and stores return unsold items, GAON corrects their data accordingly. GAON is known to be an accurate source for older releases because of that, but since they do not have current sales, they are a terrible source for debuts and new releases - thus TV shows, Awards and such use HANTEO data for up-to-date sales. Sometimes GAON data is used for end-of-year awards.

Unfortunately, GAON is far from accurate and in my experience they are very disorganized on their own. Other than their internal disorganization, a common problem between GAON and HANTEO (less with GAON) is that fans usually buy releases in bulk to try and influence shows and awards, and then RETURN those copies for a full refund. Both GAON and HANTEO then have to update the sales down, causing sales often to go down (this happens faster with HANTEO since its real-time, but happens about the same with GAON anyway). The problem is that GAON do not report these updates on real time, and on top of that, they only report the top 100 sales each month - therefore, small numbers that can end up having a big meaning month after month are not shown. Their end-of-year report, which do contain these updates, also only show the top 100, so we still miss a lot of tweaking. Therefore, even raw GAON data is not reliable.

GAON sales in this site are gathered from a bot that sums all sales from GAON monthly, then use GAON yearly to detect and correct updates. This is the most accurate one can get from GAON data, but as explained above, it has plenty of limitations. Also, since we need the yearly report to correct data, all sales for the current year should be taken with a grain of salt. They could be incorrect and we need to wait the Yearly report to correct it.

Only physical copies sold in Korea are counted.

Other tools

The API and Database Download are provided as is. Data are acquired directly from the database. The API should be used for dynamic data (like views) while the database download is better to get all data.

The "This day in Kpop" uses the date videos were published - usually in Korean time. The "This week" feature uses the week number a video was updated, taking into consideration the first and last weeks of an year can overlap with the adjacent year (so the first week of 2018 contains the last week of 2017 and so on).

While debut videos are marked, notice that not all debuts had videos, so some artists might not have debuts marked on the site.

The Top 30 history is gathered every first day of the month around 18:00 UTC automatically (for a small period it might show doubled because of an auto-backup) ever since May 2017, but due to a bug on the data gathering, Artist totals were incorrect and thus discarded up to May 2018. For that reason, for top MV's we have data starting May 2017, and for top Artists, only from May 2018.

 

Ads by google: