Following is the template approved for M. Tech. synopsis submission work:
SYNOPSIS OF M. TECH. DISSERTATION PROGRAM NAME
Name of the College:K.I.T.’S College of Engineering, Kolhapur
Name of the Course:M. Tech. (Program)
Name of the Student:Miss Vaishnavi Pravin KshirsagarDate of Admission:18/08/201
Name of the Guide:Mrs. D.K.JadhavAssistant Professor, Department of Computer Science ; Engineering
K.I.T.’s College of Engineering, Kolhapur.
7. Name of the Co-Guide: –
Proposed Title:NetSpam: a Network-based Spam Detection Framework forReviews in Online Social Media
Type of project :Non-sponsored
Name of industry/:-
Social media plays an important role in human life. Specifically Online Social Media portals plays important part in today’s digital world where it is an important platform for sellers for advertising campaigns also it is an important platform for buyers for choosing services and products. From the past few years, buyers/ customer’s decisions-making processes depends a lot on the written reviews, and positive review and negative reviews encouraging and discouraging people in their products selection and services. Also, written reviews helps sellers and service providers to improve the quality of their products and services. Thus these written reviews and ratings of products and services have become an important part in success of a business. Positive reviews gives benefits for a supplier and company, where negative reviews leaves bad impact and this cause economic losses. Also these positive/negative reviews helps users/buyers in decision making.
Reviews considered by company as a product feedback. There are two types of reviews: 1) Text Reviews 2) Rating. The fact that people with any identity can write reviews, this provides a golden opportunity for spammers. Spammer intentionally writes fake reviews to mislead user’s opinion/choice. 90 % people make choices for their purchases on the bases of reviews written by buyers. Therefore spammers are hired or enticed by companies to write fake reviews to promote their products and services review system has become a target of spammers who are usually hired or enticed by companies to write fake reviews to promote their products and services, and/or to distract customers from their competitors. Due to this, the review system has become easy target for spammers to mislead customers.
11.2 Literature review:
Reviews are used tremendously by users/customers/buyer and companies or organizations to make purchase or buy products and to make business decisions 1.Some reviews are written about products or services like how good a product or service to change user’s or customer’s perception. These kind of reviews are considered as a spam reviews 5.One of them is a classifier that can calculate feature weights that show each feature’s level of importance in determining spam reviews. The general raw concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network (HIN) and to map the problem of spam detection into a HIN classification problem 4.
There are generally three types of spam reviews:
1) Untruthful opinion spam
2) Reviews on brands only
Spam detection can be regarded as a classification problem with two classes, spam and non-spam 6.
There are three main types of information related to a review:
1) The content of review,
2) The reviewer who wrote the review,
3) The product being reviewed.
There are three types of features:
1) Review centric features,
2) Reviewer centric features,
3) Product centric features 9.
Further these types are classified as behavioral and linguistic based features .Content spam tries to add irrelevant or remotely relevant words in target pages to fool search engines to rank the target pages high 7.There are large no of duplicate and near-duplicate reviews. The detection of duplicate and near-duplicate reviews are done by using machine learning algorithm 8.
11.3 Problem definition:
To develop a software system to organize user reviews on the basis of behavioral and linguistic features also implement generic graph based algorithm to determine the weights of features and to classify test reviews into spam and non-spam labeling categories and also test and analyze the performance against standard benchmarks.
Based on literature review following research gaps are identified:
It is hard to identify the singleton review as a spam or non-spam 4
The classification of the users is difficult as one user has more than one account 6.
The reviews given in the form of ratings (star) are difficult to recognize as fake10
The review given by the spammer which is true-positive is not classified as a spam review 11.
Considering stated research gaps, following objectives are defined in proposed study:
To organize user reviews on the basis of behavioral and linguistic features.
To implement generic graph based algorithm to determine the weights of features.
To classify test reviews into spam and non-spam labeling categories.
To test and analyze the performance against standard benchmarks.
The proposed methodology has been described in 4 phases as follows.
Network Schema Definition
Metapath Definition and Creation
This phase computes the probability of review being spam. The proposed version works in two versions:
In unsupervised learning method, the initial probability of review being spam according to feature which is from set of features is calculated.
The list of spam features which determines the features engaged in spam detection is used to design network schema.
The metapath is calculated at this phase.
3) Metapath Definition and Creation:
A metapath is a sequence of relations in the network schema. The path is established using the features used in the framework.
The levels of spam certainty (using feature) for metapath are calculated in this phase.
mupl=|s×fxluWhere s=Level of spamisity
fxlu= probability of review u being spam according to feature l.
After computing levels of spam certainty for all reviews and metapaths, two reviews with the same metapath values for some metapath with feature are connected and the link is created for review network.
In next step, using the no of levels with higher value will increase the no of each feature’s metapath. Reviews can be connected to each other through these features.
The spamicity of the review with maximum no of levels is calculated.
It consists two steps:
Weight calculation which govern the importance of each spam feature in spotting spam reviews.
Labeling classification calculates the final probability of each and every review being spam.
11.6 Activity chart:
Month Activity Days
Jul-18 Project Kickoff 5
Aug-18 Data Gathering 15
Sep-18 Extraction Of Feature set and gathering review data set 25
Oct-18 Design 27
Nov-18 Development 30
Dec-18 Compute prior knowledge 35
Jan-19 development 15
Feb-19 Compute network schema 25
Mar-19 Metapath creation 30
Apr-19 Classification 30
May-19 Development 25
Jun-19 Quality Assurance 25
Jul-19 Roll out an d Maintenance 10
11.7 Cost estimation: 20000
11.8 Resources required: NetBeans, Xampp
Signature of studentSignature of Guide
J. Donfro, A whopping 20 % of yelp reviews are fake. http://www.businessinsider.com/20-percent-of-yelp-reviews-fake-2013-9. Accessed: 2015-07-30.
M. Ott, C. Cardie, and J. T. Hancock. Estimating the prevalence of deception in online review communities. In ACM WWW, 2012.
M. Ott, Y. Choi, C. Cardie, and J. T. Hancock. Finding deceptive opinion spam by any stretch of the imagination.In ACL, 2011.
Ch. Xu and J. Zhang. Combating product review spam campaigns via multiple heterogeneous pairwise features. In SIAM International Conference on Data Mining, 2014.
N. Jindal and B. Liu. Opinion spam and analysis. In WSDM, 2008.
F. Li, M. Huang, Y. Yang, and X. Zhu. Learning to identify review spam. Proceedings of the 22nd International Joint Conference on Artificial Intelligence; IJCAI, 2011.
G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh. Exploiting burstiness in reviews for review spammer detection. In ICWSM, 2013.
A. j. Minnich, N. Chavoshi, A. Mueen, S. Luan, and M. Faloutsos. Trueview: Harnessing the power of multiple review sites. In ACM WWW, 2015.
B. Viswanath, M. Ahmad Bashir, M. Crovella, S. Guah, K. P. Gummadi, B. Krishnamurthy, and A. Mislove. Towards detecting anomalous user behavior in online social networks. In USENIX, 2014.
H. Li, Z. Chen, B. Liu, X. Wei, and J. Shao. Spotting fake reviews via collective PU learning. In ICDM, 2014.