Crop Yield Prediction and Efficient use of Fertilizers

Crop Yield Prediction and Efficient use of Fertilizers

ABSTRACT:

India being an agriculture country, its economy predominantly depends on agriculture yield growth and agroindustry products. Data Mining is an emerging research field in crop yield analysis. Yield prediction is a very important issue in agricultural. Any farmer is interested in knowing how much yield he is about to expect. Analyze the various related attributes like location, pH value from which alkalinity of the soil is determined. Along with it, percentage of nutrients like Nitrogen (N), Phosphorous (P), and Potassium (K) Location is used along with the use of third-party applications like APIs for weather and temperature, type of soil, nutrient value of the soil in that region, amount of rainfall in the region, soil composition can be determined. All these attributes of data will be analyzed, train the data with various suitable machine learning algorithms for creating a model. The system comes with a model to be precise and accurate in predicting crop yield and deliver the end user with proper recommendations about required fertilizer ratio based on atmospheric and soil parameters of the land which enhance to increase the crop yield and increase farmer revenue.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

S.Bhanumathi, M.Vineeth and N.Rohit, “Crop Yield Prediction and Efficient use of Fertilizers”, IEEE International Conference on Communication and Signal Processing, April 4-6, 2019.

Spammer Detection and Fake User Identification on Social Networks

Spammer Detection and Fake User Identification on Social Networks

ABSTRACT:

Social networking sites engage millions of users around the world. The users’ interactions with these social sites, such as Twitter and Facebook have a tremendous impact and occasionally undesirable repercussions for the daily life. The prominent social networking sites have turned into a target platform for the spammers to disperse a huge amount of irrelevant and deleterious information. Twitter, for example, has become one of the most extravagantly used platforms of all times and therefore allows an unreasonable amount of spamming. Fake users send undesired tweets to users to promote services or websites that not only affect the legitimate users but also disrupt the resource consumption. Moreover, the possibility of expanding invalid information to users through fake identities has increased that results in the unrolling of harmful content. Recently, the detection of spammers and identification of fake users on Twitter has become a common area of research in contemporary online social Networks (OSNs). In this paper, we perform a review of techniques used for detecting spammers on Twitter. Moreover, a taxonomy of the Twitter spam detection approaches is presented that classifies the techniques based on their ability to detect: (i) fake content, (ii) spam based on URL, (iii) spam in trending topics, and (iv) fake users. The presented techniques are also compared based on various features, such as user features, content features, graph features, structure features, and time features. We are hopeful that the presented study will be a useful resource for researchers to find the highlights of recent developments in Twitter spam detection on a single platform.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

FAIZA MASOOD, GHANA AMMAD, AHMAD ALMOGREN, (SENIOR MEMBER, IEEE), ASSAD ABBAS, HASAN ALI KHATTAK, (SENIOR MEMBER, IEEE), IKRAM UD DIN, (SENIOR MEMBER, IEEE), MOHSEN GUIZANI, (FELLOW, IEEE), AND MANSOUR ZUAIR, “Spammer Detection and Fake User Identification on Social Networks”, IEEE Access, 2019.

Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media

ABSTRACT:

Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking a medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15,714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Boshu Ru et.al, “Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage from Social Media”, IEEE Transactions on NanoBioscience, 2019.

Sentiment Classification using N-gram IDF and Automated Machine Learning

Sentiment Classification using N-gram IDF and Automated Machine Learning

ABSTRACT:

We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineering related, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 values in positive and negative sentences on all datasets.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Rungroj Maipradit_, Hideki Hata_, Kenichi Matsumoto, “Sentiment Classification using N-gram IDF and Automated Machine Learning”, IEEE Software, 2019

SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis

SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis

ABSTRACT:

Twitter sentiment analysis has become a hot research topic in recent years. Most of existing solutions to Twitter sentiment analysis basically only consider textual information of Twitter messages, and struggle to perform well when facing short and ambiguous Twitter messages. Recent studies show that sentiment diffusion patterns on Twitter have close relationships with sentiment polarities of Twitter messages. Therefore, in this paper we focus on how to fuse textual information of Twitter messages and sentiment diffusion patterns to obtain better performance of sentiment analysis on Twitter data. To this end, we first analyze sentiment diffusion by investigating a phenomenon called sentiment reversal, and find some interesting properties of sentiment reversals. Then we consider the inter-relationships between textual information of Twitter messages and sentiment diffusion patterns, and propose an iterative algorithm called SentiDiff to predict sentiment polarities expressed in Twitter messages. To the best of our knowledge, this work is the first to utilize sentiment diffusion patterns to help improve Twitter sentiment analysis. Extensive experiments on real-world dataset demonstrate that compared with state-of-the-art textual information based sentiment analysis algorithms, our proposed algorithm yields PR-AUC improvements between 5:09% and 8:38% on Twitter sentiment classification tasks.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Lei Wang, Jianwei Niu, Senior Member, IEEE, and Shui Yu, Senior Member, IEEE, “SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis”, IEEE Transactions on Knowledge and Data Engineering, 2019.

Prediction of Heart Disease Using Machine Learning Algorithms

Prediction of Heart Disease Using Machine Learning Algorithms

ABSTRACT:

Health care field has a vast amount of data, for processing those data certain techniques are used. Data mining is one of the techniques often used. Heart disease is the Leading cause of death worldwide. This System predicts the arising possibilities of Heart Disease. The outcomes of this system provide the chances of occurring heart disease in terms of percentage. The datasets used are classified in terms of medical parameters. This system evaluates those parameters using data mining classification technique. The datasets are processed in python programming using two main Machine Learning Algorithm namely Decision Tree Algorithm and Naïve Bayes Algorithm which shows the best algorithm among these two in terms of accuracy level of heart disease.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Santhana Krishnan.J, Geetha.S, “Prediction of Heart Disease Using Machine Learning Algorithms”, IEEE 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), 2019.

Location Inference for Non-geotagged Tweets in User Timelines

Location Inference for Non-geotagged Tweets in User Timelines

ABSTRACT:

Social media like Twitter have become globally popular in the past decade. Thanks to the high penetration of smartphones, social media users are increasingly going mobile. This trend has contributed to foster various location based services deployed on social media, the success of which heavily depends on the availability and accuracy of users’ location information. However, only a very small fraction of tweets in Twitter are geo-tagged. Therefore, it is necessary to infer locations for tweets in order to attain the purpose of those location based services. In this paper, we tackle this problem by scrutinizing Twitter user timelines in a novel fashion. First of all, we split each user’s tweet timeline temporally into a number of clusters, each tending to imply a distinct location. Subsequently, we adapt two machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. The Bayes based model focuses on the information gain of words with location implications in the user-generated contents. The convolutional LSTM model treats user-generated contents and their associated locations as sequences and employs bidirectional LSTM and convolution operation to make location inferences. The two models are evaluated on a large set of real Twitter data. The experimental results suggest that our models are effective at inferring locations for non-geotagged tweets and the models outperform the state-of-the-art and alternative approaches significantly in terms of inference accuracy.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Pengfei Li, Hua Lu, Senior Member, IEEE, Nattiya Kanhabua, Sha Zhao, and Gang Pan, “Location Inference for Non-geotagged Tweets in User Timelines”, IEEE Transactions on Knowledge and Data Engineering, 2019.

Leveraging Product Characteristics for Online Collusive Detection in Big Data Transactions

Leveraging Product Characteristics for Online Collusive Detection in Big Data Transactions

ABSTRACT:

Online fraud transaction has been a big concern for e-business platform. As the development of big data technology, e-commerce users always evaluate the sellers according to the reputation scores supplied by the platform. The reason why the sellers prefer chasing high reputation scores is that high reputations always bring high profit to sellers. By collusion, fraudsters can acquire high reputation scores and it will attract more potential buyers. It has been a crucial task for the e-commerce website to recognizing the fake reputation information. E-commerce platforms try to solve this continued and growing problem by adopting data mining techniques. With the high development of the Internet of Things, big data plays a crucial role in economic society. Big data brings economic growth in different domains. It supplies support to the management and decision-making ability in e-business through analyzing operational data. In online commerce, the big data technology also helps in providing users with a fair and healthy reputation system, which improves the shopping experience. This paper aims to put forward a conceptual framework to extract the characteristics of fraud transaction, including individual- and transaction-related indicators. It also contains two product features: product type and product nature. The two features obviously enhance the accuracy of fraud detection. A real-world dataset is used to verify the effectiveness of the indicators in the detection model, which is put forward to recognize the fraud transactions from the legitimate ones.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

SUYUAN LUO AND SHAOHUA WAN, “Leveraging Product Characteristics for Online Collusive Detection in Big Data Transactions”, IEEE Access, 2019.

FunkR-pDAE: Personalized Project Recommendation Using Deep Learning

FunkR-pDAE: Personalized Project Recommendation Using Deep Learning

ABSTRACT:

In open source communities, developers always need to spend plenty of time and energy on discovering specific projects from massive open source projects. Consequently, the study of personalized project recommendation for developers has important theoretical and practical significance. However, existing recommendation approaches have clear limitations, such as ignoring developers’ operating behavior, social relationships and practical skills, and are very inefficient for large amounts of data. To address these limitations, this paper proposes FunkR-pDAE (Funk singular value decomposition Recommendation using pearson correlation coefficient and Deep Auto-Encoders), a novel personalized project recommendation approach using a deep learning model. FunkR-pDAE first extracts data related to developers and open source projects from open source communities, which build a developer-open source project relevance matrix and a developer-developer relevance matrix. Meanwhile, Pearson Correlation Coefficient is utilized to calculate developer similarity using the developer-developer relevance matrix. Second, deep auto-encoders are used to learn the factor vectors that represent developers and open source projects. Finally, a sorting method is defined to provide personalized project recommendations. Experimental results on real-world GitHub data sets show that FunkR-pDAE has a precision rate of 75.46% and a recall rate of 40.32%, which provides more effective recommendation compared with state-of-the-art approaches.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Pengcheng Zhang, Fang Xiong, Hareton Leung, and Wei Song, “FunkR-pDAE: Personalized Project Recommendation Using Deep Learning”, IEEE Transactions on Emerging Topics in Computing, 2019.

Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse Balanced Support Vector Machine

Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse Balanced Support Vector Machine

ABSTRACT:

The diagnosis of Type 2 Diabetes (T2D) at an early stage has a key role for an adequate T2D integrated management system and patient’s follow-up. Recent years have witnessed an increasing amount of available Electronic Health Record (EHR) data and Machine Learning (ML) techniques have been considerably evolving. However, managing and modeling this amount of information may lead to several challenges such as overfitting, model interpretability and computational cost. Starting from these motivations, we introduced a ML method called Sparse Balanced Support Vector Machine (SB-SVM) for discovering T2D in a novel collected EHR dataset (named FIMMG dataset). In particular, among all the EHR features related to exemptions, examination and drug prescriptions we have selected only those collected before T2D diagnosis from a uniform age group of subjects. We demonstrated the reliability of the introduced approach with respect to other ML and Deep Learning approaches widely employed in the state-of-the-art for solving this task. Results evidence that the SB-SVM overcomes the other state-of-the-art competitors providing the best compromise between predictive performance and computation time. Additionally, the induced sparsity allows to increase the model interpretability, while implicitly managing high dimensional data and the usual unbalanced class distribution.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : Python
  • Database : MYSQL

REFERENCE:

Michele Bernardini, Luca Romeo, Paolo Misericordia, and Emanuele Frontoni, Senior Member, IEEE, “Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse Balanced Support Vector Machine”, IEEE Journal of Biomedical and Health Informatics, 2019.