Keyword extraction on online advertisement using clustering and classification methodology
Date of Issue2017-04-17
School of Computer Science and Engineering
Keyword advertising is a form of online advertising that an advertiser pays to have an advertisement appear in the results listing when a person uses a phrase to search the web. Selection of keywords is particularly important as they summarize the key characteristics of the advertised products and services, and serve as the important factor for advertiser to increase the reach of the advertisement (Ad) and potentially the conversion rate. In my company, Optimate, we provided the services to help clients optimize their online marketing campaign, advertisement placements and customer reach via multiple channels such as Google Adwords and Facebook. Keyword selection remains a crucial component to increase the overall effectiveness and efficiency of the services. In the report, I aim to propose a new keyword extraction approach from the advertisement text, while considering the grammar pattern of the text, historical ads and the other attributes such as industry and objective. The whole approach can be broadly divided into three phases, keyword candidate generation, Clustering using K-Means and K-nearest-neighbour classification. Selection rules on keyword candidates are based on linguistic feature and Part-of-Speech (POS) pattern of the ad content. The aim of keyword candidates is to generate a comprehensive list of possible keywords for subsequent classification. Kmeans clustering divides ads into different groups, and the subsequent classification is performed only on the group which the ad is in. Such way helps reduce the computing complexity and choose the best group which can yield better keywords. Then the TD-IDF feature of the keyword candidates is analysed. Cosine Distance is also computed and inputted into K-nearest-neighbour classification. Based on the majority vote of 20 neighbour keywords, the candidate keyword is classified into either a true keyword or a false keyword. This approach achieves good results in extracting keywords, but there are still issues limiting its effectiveness. Nevertheless, this approach offers a quick, highly flexible, and easily implementable solution to keyword extraction.
DRNTU::Library and information science
Final Year Project (FYP)
Nanyang Technological University