Akshay Bhavsar, Jitin Nair, Abhishek Upadhyay, Dnyaneshwar Bhabad Department of Computer Engineering, VIVA Institute of Technology.
Abstract—Images are the things those are being used in day to day life so much that it has become an important aspect in everyone’s life. Images have always been a popular thing to be instantly shared among each other around the globe. Today’s world is a place where one can get easily influenced and images being perfect to influence as it easily conveys a lot using visual representation also it tends to be remembered more. Images can have positive influence via means of motivational quotes images also it can have negative influence via means of sharing nude images, images of killing etc, retrieval of images with object-of-interest from vast pool of social media images has been a research interest in understanding human emotions, situations leading to conflicts, in field of cybercrime etc. The proposed system is a model for suspicious image detection; it is a system which will decide the image is a good or bad influence based on the past and present experiences.
The proposed system makes use of more number of attributes than the existing system which makes prediction more accurate. It is a machine learning application where a large amount of data is analyzed and some meaningful information is extracted. The system makes use of the huge dataset of data given to it and that data being trained.Keywords—social media; image retrieval; positive-negative influence;INTRODUCTIONIn the recent years, Social media has been a place where people from different parts and corner around the world come and interact with each other.
The ever increasing popularity of social media has impacted people’s life in many ways be it about making their brand presence value felt, be it about knowing people and staying connected with them from wherever you are etc. The most famous social media sites are Facebook, Twitter, Instagram. All three top social media sites have one thing in common i.e. “Images” constitute large amount of their data shared cluster. Using social media one can propagate information as well as their ideas and opinions.
The above stated aspect can turn out to be a positive influence on people as well as it can lead to being a negative influence on people as well. Removing such negative influence content has always been a pain task because the current system does not use any sophisticated techniques to find such content instead it relies on human to do this task which leads to use of additional human resources resulting in increasing cost and time overhead to do that task. Let us look at instances where certain terrorist organization propagate their ideas via form of images. However due to large number of user accounts and social groups on social media sites, it has become increasingly difficult to manually identify and track such groups since not all users will be into spreading negative influences. We are currently focusing on images that contains Objects-of-interest such as ISIS images (Fig.1 (a)), Riots Images (Fig.
1 (b)). As seen from Fig.1, Different account share image varying containing similar content but with different editing styles on social media sites, making it difficult for object recognition and Pre-processing. Fig 1.
1. ISIS images Fig. 1.2. Riots images In computer vision, object detection is addressed as one of the most challenging problems as it is prone to localization and classification error .
A human brain makes vision seems easy. It doesn’t take any effort for humans to tell apart a lion and a jaguar, read a sign, or recognizes a human’s face. But these are actually hard problems to solve with a computer: they only seem easy because brains are incredibly good at understanding images.
In the last few years the field of machine learning has made a tremendous progress on addressing these difficult problems. In particular, found a kind of model called a deep convolutional neural network can achieve reasonable performance on hard visual recognition task than those old SIFT such that it matching exceeds human performance in some domains  . Researches have demonstrated steady progress in computer vision by validating their work against ImageNet – an academic benchmark for computer vision . Successive models continue to show improvements, each time achieving a new state-of-the-art result: QuocNet, AlexNet, Inception (GoogLeNet), BN-Inception-v2. Researchers both internal and external to Google have published papers describing all these models but the results are still hard to reproduce. *****Now taking the next step by releasing code for running image recognition on latest model, Inception-v3.
*** Inception-v3 is trained for the ImageNet Large Visual Recognition using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes like “Zebra”, “Dalmatian”, and “Dishwasher”. For example, here are the results from AlexNet classifying some images in following. Fig 1.3. Machine learning technique Our proposed model provides a model using which we can extract the region of interest (ROI) from a particular image resulting in negative influence image being easily found as compared to the previous techniques present. ROI can extract region from which it will further increase chance of detecting suspicious image being uploaded.
Our contributions are in the following aspects: 1) reduced network for retrieving object of interest from social platform which reduces the amount of data required for the purpose of training and computation time. 2) framework for detection of explicit images as well as suspicious images being uploaded on social platformThe proposed modelThe proposed system prepares a model for detection of suspicious images which uses R-CNN for detection of objects from an image. The proposed system performs the steps of data as input, learning from past and present experience, transformation of data and training done on it. In proposed system use of R-CNN algorithm which is a combination of Region-Net and Object-CNN (O-CNN)In Region-Net, it will extract the region from image it will get a wrapped image from which it will become easy for CNN to detect the object.
After detection of object from the image then according to the image, the system will decide whether the content is viewable to everyone or not. What images are to be deleted, the decision will be made according to content of the image as well as the dataset trainedFig 2.1 shows the System Flow diagram for the proposed system, which also shows the detailed working of the proposed model stated, it also shows the flow of control of what happen if certain condition is met and accordingly it performs its respective task assigned to it based on the condition triggered. Fig.
2.1. System FlowThe above system flow diagram can be divided into two main blocks image detection and image classification.Image Detection blocks which contains CNN algorithm is used in a modified way for detection of objects from images. Modified CNN also called as R-CNN in which Region-Net is used to generate the semantic regions for object proposals that are taken as input to the O-CNN in order to extract deep features . The information flow from region-Net to O-CNN helps us to build object detector . When user will upload the image, first the image will be detected.
Then a Modified CNN algorithm in which Region-Net will scan the semantic regions that CNN will work. Then from the selected region this will to detect the object more accurately. Based on the results further classifications will be made.Further after the image is detected, Image is classified. CNN have been widely used in automatic image classification systems. In most cases, features from the top layer of the CNN are utilized for classification; however, those features may not contain enough useful information to predict an image correctly.
In some cases, features from the lower layer carry more discriminative power than those from the top. Therefore, applying features from a specific layer only to classification seems to be a process that does not utilize learned CNN’s potential discriminant power to its full extent. This inherent property leads to the need for fusion of features from multiple layers. To address this problem, a method of combining features from multiple layers in given CNN models. Moreover, already learned CNN models with training images are reused to extract features from multiple layers.
The proposed fusion method is evaluated according to image classification benchmark data sets and SVM. Image classification is an important topic in artificial vision systems, and has drawn a significant amount of interest over the last decades. The field aims to classify an input image based on visual content. Currently, most researchers have relied on hand-crafted features, HoG or SVM, random forest and decision tree are applied to extracted features to make a final decisionIn image classification, after the results received from algorithm that from which category the images belong and as per the result the image will be uploaded or deleted. If the images contain the explicit content it will be deleted and adult content as per the category it will be uploaded for particular age group.Experiment & ResultsIn total two mainly divided social media image datasets our team has collected with specific object of interest which contains -ISIS images, riots, murders, nude images were chosen as explicit content.
Whereas adultery contents images was the second main class. These images were collected by crawling from various social media websites. These datasets were mainly chosen to classify the contents on social media websites and show suspicious involvements and activities on social media sites. For example, there are numbers of non-trivial accounts on social media websites actively involved in distributing explicit contents such as sharing about ISIS images, riots, nude contents, etc. which is not at all appropriate for any one.
On the basis of our current research focuses has been to work on expansion of our understanding of the network of explicit material such as ISIS related, riots related accounts by exploring this kind of data programmatically. One of the targeted goals is to effectively identify who is “in the network”. The use of social media image data was previously totally based on text analysis. These images have been collected by UAB Computer Forensics Research Laboratory. The ratio of positive images in training dataset was three positive datasets i.e. 3:1 in each data set.
Practically, on social media sites the ‘positive’ images, i.e., images having objects-of-interest only have a very small portion and the majority is negative ones. Therefore, during the testing, to resemble the real scenario, the positive to negative ratio is 1:10, understanding that the true ratio would be higher. However, during training, the ratio from positive to negative is 1:1 so that the greater diversity in the negative set will not interfere the positive ones, and less a problem with over fitting.
Our model is tuned on top of distilled model for each dataset with 100,000 iterations.Experiments carried out to validate the effectiveness of our framework in retrieving the images using objects-of-interest. Framework is compared with extended General Hough Transform (GHT) which resulted in better comparison with SURF . Our framework is also compared with CNN which is full size, pre-trained fine-tuned using social media image sets. We used the best CNN model which was trained on Imagenet with the best result of 320,000 iterations. In our dataset, there are in total two positive datasets and one negative dataset.
So, we also experimented with dividing two individual categories and representing all the negative images (i.e., images does not contain any object-of-interest) in one single category, and trained one single CNN. These experiments for the CNN models are carried out in a single CPU with a configuration of I5 processor in it. All the implementations involving CNN are carried out using Tensorflow .In total, there are 1000 images as a data set which further divided into explicit images and adultery images with a dataset of 500 each. For our framework we also created a confusion matrix which is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. There are two possible classes: “yes” and “no”.
If we were predicting the presence of explicit content or adultery content, for example, “yes” would mean they belong to classification class, and “no” would mean they don’t belong to the classification. The classifier made a total of 150 predictions (e.g., 150 images were being tested for the presence of if they contain the adultery content or explicit content). Out of those 150 images, the classifier predicted “yes” 105 times, and “no” 45 times. In reality, 100 images in the sample have explicit content, and 50 do not.
Fig.3.1. Confusion MatrixdiscussionThe main goal of the system is to detect images which are not suitable for people for viewing and also which influences humans in a negative way making social media a safe play to inspire people and spread positivity without the negative aspect of the social media. Our Framework using region CNN based model presents significantly better results than the previous research carried out by different groups.
Our system can help to avoid violence and anger over a negative image among people which humans tend to do without knowing the whole story as well as won’t make people feel depress because of negative influence of some bad images. Previous models used a small set of parameters and parameter tuning wasn’t possible with such small set of parameters which resulted in not covering whole context of an image which is very well overcome by our system. The problem with CNN was that it selected the whole image and then classifying instead of targeting the one which is a negative thing in the, image this leads to increase in computational time and its complexity. So, the first thing needed to do was find a way by which intended image features are selected that too via automated way and not requiring manually do so every time an image is selected. Detection of object is one of the fundamental problems in computer vision because most of the object recognition method involves edge based feature extraction method. With the use of region net CNN it firstly divides images into regions and based on the region which is a negative one it classify them according to system trained this helps us achieving a significant better result than the previous model created leading to savage of time and resources as well.
Our current results using region net CNN model showed significant improvement in image retrieval as compared to old techniques like handpicked features etc. Social media being such an important mean of communication such that Social media offers a large sample of information’s which tell us about a person characteristic, if a person shares a lot of depressed quote images, our system can easily find them and notify concerning authorities on repeated task which could possibly help them in a positive way and get them out of their depression and also not affecting others. Fig.4.1.
Social media UsageSocial media information comes with its own challenges. They include things like recording the data some data can be small in size while some can be large; different people from different part of the world convey things differently(e.g., certain hand gestures can mean something bad in certain region while in some region it might be something to convey to a person); popularity of meme photos has led to a problem to understand whether the image is conveying positivity via means of sarcasm or are spreading hatred and last but not the least assessing the validity of the data. Moreover, working with region net CNN helped us in reducing the computational time required to the do the intended task which helped us to generate a better result. In this experimental setup, we didn’t follow the common practice instead we represented negative images in different classes. However, one need to understand usage of all negative images cannot be ever possible because each negative image is negative in their own way as compared. conclusionIn the proposed system, implementing a technique for Detection of suspicious image using R-CNN, applied on the Pre-processed dataset which is being collected from various images of different categories.
The drawback of use of distilled model is being overcome by the process of parameter Tuning. Structure utilizing a R-CNN-based model which ended up being ready to better catch the highlights for each of the question of enthusiasm for a paired class show. Additionally, our system utilizes a refined R-CNN demonstrate, got from a full size, pre-trained R-CNN display, which likewise adds to the change on recovery precision and also execution.
This parameter tuning will help the system to improve its efficiency of detecting explicit images.One of the worries in profound learning models is parameter tuning. For our situation, despite the fact that we didn’t explore different avenues regarding tuning the parameters, despite everything it accomplished noteworthy change.
In our future works, we will try different things with mechanizing the tuning of various parameters. In our current setup, we just utilized a refined model which is a large portion of the extent of the bigger model.REFERENCE Suryani, Dewi, Patrick Doetsch, and Hermann Ney. “On the Benefits of Convolutional Neural Network Combinations in Offline Handwriting Recognition.
” Frontiers in Handwriting Recognition (ICFHR), 2016 15th International Conference on. IEEE, 2016.  Oquab, Maxime, et al. “Learning and transferring mid-level image representations using convolutional neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.  Bappy, Jawadul H.
, and Amit K. Roy-Chowdhury. “CNN based region proposals for efficient object detection.” Image Processing (ICIP), 2016 IEEE International Conference on. IEEE, 2016.  Wang, Limin, et al. “Better exploiting os-cnns for better event recognition in images.
” Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015.  Zhou, Wengang, et al. “SIFT match verification by geometric coding for large-scale partial-duplicate web image search.
” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9.1 (2013): 4.  Iwasa, Hayato, et al.
“Facial features extraction by accelerated implementation of circular hough transform and appearance evaluation.” Frontiers of Computer Vision (FCV), 2015 21st Korea-Japan Joint Workshop on. IEEE, 2015.
 Jia, Yangqing, et al. “Caffe: Convolutional architecture for fast feature embedding.” Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
 Abadi, Martín, et al. “Tensorflow: Large-scale machine learning on heterogeneous distributed systems.” arXiv preprint arXiv:1603.04467 (2016). 22  LeCun, Yann, Fu Jie Huang, and Leon Bottou. “Learning methods for generic object recognition with invariance to pose and lighting.
” Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on.
Vol. 2. IEEE, 2004.  Mishkin, Dmytro, Nikolay Sergievskiy, and Jiri Matas. “Systematic evaluation of CNN advances on the ImageNet.” arXiv preprint arXiv:1606.02228 (2016).  Karpathy, Andrej, et al.
“Large-scale video classification with convolutional neural networks.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014.  Niaf, Emilie, et al. “SVM with feature selection and smooth prediction in images: Application to CAD of prostate cancer.
” Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.  Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E.
Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems.