A large number of online media companies source their content from the traditional content-producing media companies like news agencies, recording labels etc. Consequently, handling content and presenting it in a particular common format is a major challenge for these companies especially when content is sourced from several providers with differing presentation patterns. Content management at this level involves two broad tasks:
- Mapping all content to a common format for presentation and for ease of mining.
- Categorization of content by user intent.
Storing content in a common schema is essential if content mining is to be automated to some extent. However, the bigger challenge surfaces after this when the content is to be categorized. Categorization is normally done to aid the user in browsing the content and also in presenting relevant search results.
Categorization can be done based on several parameters. In case of music, the problem is simplified to a great extent since most of music is categorized by genre and further by artists where an artist can fall under several genres. In this case, the categorization can be fully automated once the taxonomy of genres and artists is in place. However, in case of language content which needs to be categorized by subject matter, a good deal of social intelligence is required to understand the topic to which a certain item belongs. In such cases, machine learning and classification will always yield errors. Errors may dip with increased machine learning but will always remain. Hence, some amount of manual effort is essential in such categorization. The trick then is to figure out which parts should be automated and which parts manual since there is a trade off between cost and accuracy here.
Categorization essentially can be broken down into two further tasks:
- Finding categories ‘related’ to an item.
- Deciding which of those categories is/are actually ‘relevant’.
In the case of every item being mapped to only one category (one-to-one mapping), the second problem can essentially be combined with the first. However, in a one-to-many mapping, one needs to use social intelligence (possible with a manual workforce) to determine which categories are relevant.
To provide a general rule (with exceptions), finding related categories should be automated, especially in a one-to-many mapping since an algorithm will do a more exhaustive job than a human. However, in deciding which of these categories are relevant, manual workforce alone can bring in the level of social intelligence that is required to determine the subject matter of a certain item and accordingly categorize it.
No comments:
Post a Comment