Effective Misinformation Organization and Management Using Data Classification

Abstract
With the ubiquitous social media and networking services, everyone could be a commentator or reporter nowadays. This information explosion benefits most people, but it may also cause a great negative impact like making people refuse to take COVID-19 vaccine because of fake information sent intentionally or unintentionally. Hence, misinformation identification and management is critical before people can actually benefit from the information they receive. Misinformation usually needs to be organized and saved in databases before identification and management can be applied. This research is to organize misinformation according to its types such as health, politics, and businesses, so the detection can use the results to identify misinformation better. The proposed method uses various data/text mining and information retrieval technologies including lexical analysis, stopword elimination, stemming, thesaurus building, and decision tree to classify the misinformation into one of the classes like (i) businesses, (ii) health, (iii) politics, (iv) science, and (v) news. For example, if the misinformation includes the keywords COVID-19, vaccine, coronavirus, and CDC, then it may be classified as health misinformation. The classified misinformation and its keywords and phrases are then saved in the database according to its class for the next steps of misinformation management such as identification. For example, misinformation detection normally requires to use the keywords and phrases like credit cards, vaccine, or CDC to identify the health misinformation.

Keywords
security, misinformation, misinformation classification, text mining, data mining, data classification, decision tree, similarity measurement, sentence similarity

Conference