Natural Language Processing in Computer Science Research
Natural language processing (NLP) is a sub-domain of computer science, information engineering, and artificial intelligence which deals with the interaction between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of data. Moreover, Big Data is another exciting area of computer science research which helps to tackle a massive amount of data. NLP is one of the most widely accepted computer science research areas.
Anthony Pesce said “You usually hear about it in the context of analysing large pools of legislation or other document sets, attempting to discover patterns or root out corruption.”
A 2017 Tractica report on the natural language processing (NLP) market estimates the total NLP software, hardware, and services market opportunity to be around $22.3 billion by 2025. The report also forecasts that NLP software solutions leveraging AI will see a market growth from $136 million in 2016 to $5.4 billion by 2025.
Keyword extraction means the automatic identification of terms that best describe the subject of a document. Key-phrases, key segments or just keywords are used for defining the words that represent the most relevant information contained in the text. Although the terminology is different, the function is the same: characterization of the topic discussed in a document.
Text summarization is the process of shortening a text document, in order to create a summary of the major points of the original document. The main idea of summarization is to find a subset of data which contains the “information” of the entire set. It is very important area of computer science research which also enhances the field to extract summaries from streaming data.
Event Detection in Text
Event detection is a process of analysis of text documents aiming to uncover real events happening in the world. It is based on the assumption that words appearing in similar documents and time windows are likely to concern the same real-world event. Therefore, our method attempts to group together words with similar temporal and semantic characteristics while discarding noisy words, not contributing to anything of interest. This results in a concise event representation through a set of representative keywords.
Stemming and Lemmatization
Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science research since the 1960s.
Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form.
Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster. The reduced “accuracy” may not matter for some applications. In fact, when used within information retrieval systems, stemming improves query recall accuracy, or true positive rate, when compared to lemmatisation. Nonetheless, stemming reduces precision or true negative rate, for such systems.
Network Science in Computer Science Research
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The analysis of networks has received a major boost caused by the widespread availability of huge network data resources in the last years.
The great challenge for network science scholars is that they need to both gain a common foundational training in the fundamental language, methods, and theories of Network Science, while also gaining a theoretical and substantive foundation in a particular discipline to apply these perspectives. Research on network connections among multiple types and levels of “actors” offers a potentially powerful mechanism to understand the workings of complex systems across broad areas of science.
Community detection is very important to reveal the structure of social networks, dig to people’s views, analyse the information dissemination and grasp as well as control the public sentiment. In recent years, with community detection becoming an important field of social networks analysis, a large number of academic literatures proposed numerous methods of community detection.
With the development of Internet and computer science, more and more people join social networks. People communicate with each other and express their opinions on the social media, which forms a complex network relationship. With this, community detection in network science has proved to be very important computer science research topic. Individuals in the social networks form a “relation structure” through various connections which produces a large amount of information dissemination. This “relation structure” is the community that we are going to research.
To disseminate, in the field of communication, means to broadcast a message to the public without direct feedback from the audience.
Interconnections between people on social network sites enhance the process of information dissemination and amplify the influence of that information. This study designed a Facebook application to examine the influence of peoples’ network on information dissemination. The results showed that:-
- Both network degree and network cluster significantly affected information dissemination frequency.
- People with more connections and with high clustered connections might exert a greater influence on their information dissemination process.
The findings of this study have useful implications for the theory of network effect, as well as useful references and suggestions for marketers.
The dissemination strategy defines a consistent approach to key target groups and will be based on a target group analysis with support of key stakeholders, including also language adaptation and content translation by the respective local partners.
Influential Nodes Detection
Social network is an abstract presentation of social systems where ideas and information propagate through the interactions between individuals. It is an essential issue to find a set of most influential individuals in a social network so that they can spread influence to the largest range on the network. Traditional methods for identifying influential nodes in networks are based on greedy algorithm or specific centrality measures. Some recent researches have shown that community structure, which is a common and important topological property of social networks, has significant effect on the dynamics of networks. However, most influence maximization methods do not take into consideration the community structure in the network, which limits their applications on social networks with community structure.
Heterogeneous Information Networks
The network schema of a heterogeneous information network specifies type constraints on the sets of objects and relationships among the objects. These constraints make a heterogeneous information network semi-structured, guiding the semantics explorations of the network. An information network following a network schema is called a network instance of the network schema.
- Heterogeneous networks include different types of nodes or links.
- A heterogeneous network can be converted into a homogeneous network through network projection or ignoring object heterogeneity, while it will make significant information loss.
- Most real systems include multi-typed interacting objects. For example, a social media website (e.g., Facebook) contains a set of object types, such as users, posts, and tags, and a health care system contains doctors, patients, diseases, and devices. Generally speaking, these interacting systems can all be modelled as heterogeneous information networks.
4G Networks in Computer Science Research
User requirements are growing faster than ever and the limitations of the current mobile communication systems have forced the researchers to come up with more advanced and efficient technologies. 4G mobile technology is the next step in this direction. 4G is the next generation of wireless networks that will totally replace 3G networks. It is supposed to provide its customers with better speed and all IP based multimedia services. 4G is all about an integrated, global network that will be able to provide a comprehensive IP solution where voice, data and streamed multimedia can be given to users on an “Anytime, Anywhere” basis.
4G presents a solution of this problem as it is all about seamlessly integrating the terminals, networks and applications. The race to implement 4G is accelerating as well as quite challenging. Similarly, 5G is upcoming computer science research topic which is gaining attention.
Evolution of 4G technology
|Services||Analog voice||Digital voice||Higher capacity, packetized data||Higher capacity, broadband data up to 2mbps.||Completely IP based, speed up to hundreds of MBs|
|Standards||NMT, AMPS, Hicap, CDPD, TACS, ETACS.||GSM, iDEN, D-MPS||GPRS, EDGE etc.||WCDMA, CDMA 2000.||Single standard|
|Data Bandwidth||1.9 kbps||14.4 kbps||384 kbps||2 Mbps||200 Mbps|
|Multiplexing||FDMA||CDMA, TDMA||CDMA, TDMA||CDMA||CDMA|
|Core Network||PSTN||PSTN||PSTN, packet network||packet network||Internet|
Features of fourth generation technology
Industry experts say that users will not be able to take advantages of rich multimedia content across wireless networks with 3G. In contrast to this 4G will feature extremely high quality video of quality comparable to HD (high definition) TV. Wireless downloads at speeds reaching 100 Mbps, i.e. 50 times of 3G, are possible with 4G.
Interoperability and easy roaming
Multiple standards of 3G make it difficult to roam and interoperate across various networks, whereas 4G provides a global standard that provides global mobility. Various heterogeneous wireless access networks typically differ in terms of coverage, data rate, latency, and loss rate. Therefore, each of them is practically designed to support a different set of specific services and devices, 4G will encompass various types of terminals, which may have to provide common services independently of their capabilities.
Fully converged services
If a user want to be able to access the network from lots of different platforms: cell phones, laptops, PDAs he is free to do so in 4G which delivers connectivity intelligent and flexible enough to support streaming video, VoIP telephony, still or moving images, e-mail, Web browsing, e-commerce, and location-based services through a wide variety of devices. That means Freedom for consumers.
4G systems will prove far cheaper than 3G, since they can be built atop existing networks and won’t require operators to completely retool and won’t require carriers to purchase costly extra spectrum. In addition to being a lot more cost efficient, 4G is spectrally efficient, so carriers can do more with less.
Challenges in migration to 4g in Computer Science Research
- Multimode user terminals : With 4G there will be a need to design a single user terminal that can operate in different wireless networks and overcome the design problems such as limitations in size of the device, its cost and power consumption. This problem can be solved by using software radio approach i.e. user terminal adapts itself to the wireless interfaces of the network.
- Selection among various wireless systems : Every wireless system has its unique characteristics and roles. The proliferation of wireless technologies complicates the selection of most suitable technology for a particular service at a particular place and time. This can be handled by making the selection according to the best possible fit of user QoS requirements and available network resources.
- Security Heterogeneity of wireless networks complicates the security issue : Dynamic reconfigurable, adaptive and lightweight security mechanisms should be developed.
- Network infrastructure and QoS support : Integrating the existing non-IP and IP-based systems and providing QoS guarantee for end-to-end services that involve different systems is also a big challenge.
- Charging/ billing : It is troublesome to collect, manage and store the customers’ accounts information from multiple service providers. Similarly, billing customers with simple but information is not an easy task.
- Attacks on application level: 4G cellular wireless devices will be known for software applications which will provide innovative feature to the user but will introduce new holes, leading to more attacks at the application level.
- Jamming and spoofing : Spoofing refers to fake GPS signals being sent out, in which case the GPS receiver thinks that the signals comes from a satellite and calculates the wrong co-ordinates. Criminals can use such techniques to interfere with police work. Jamming happens when a transmitter sending out signals at the same frequency displaces a GPS signal.
- Data encryption : If a GPS receiver has to communicate with the central transmitter then the communication link between these two components is not hard to break and there is a need of using encrypted data.
Scope in 4G
There are several technologies suggested to deploy in the 4G and these may include:
- Software Defined Radio (SDR): is a radio communication system where components that have typically been implemented in hardware (i.e. mixers, filters, amplifiers, modulators/demodulators, detectors. etc.) are instead implemented using software on a personal computer or other embedded computing devices.
- Orthogonal frequency-division multiplexing (OFDM): is a frequency-division multiplexing (FDM) scheme utilized as a digital multi-carrier modulation method.
- Multiple-input and multiple-output, or MIMO): is the use of multiple antennas at both the transmitter and receiver to improve communication performance.
4G will certainly add perceived benefit to an ordinary person’s life over 3G. 4G will be an intelligent technology that will interconnect the entire world seamlessly. Projected 4G mobile communication system will reduce number of different technologies to a single global standard. Technologies are evolving every day and night but the final success of 4G mobile communication will depend upon the new services and contents made available to users. These new applications must meet user expectations, and give added value over existing offers.
Artificial Intelligence in Computer Science Research
The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. Artificial Intelligence becomes the important part of our daily life. Our life is changed by AI because this technology is used in a wide area of day to day services. Dependency has increased so much that people are finding more and more ways to make everyday chores easier. Research on this has got no end because possibilities keep growing with benefits and needs. AI provides work efficiency. Without the interference of human, so the error is reduced and the chance of accuracy increased. Artificial Intelligence is a very broad field wherein it cannot be just defined by robots and machines. AI is an area of study in Computer Science that focuses on creating computer systems that perform tasks that require human intelligence. There are many different recent research trends in artificial intelligence have research papers. Thus, many academic researchers and practitioners are working on Artificial Intelligence recent research topics of computer science.
Soft computing is considered to be an important tool to perform several computing operations that include neural networks, fuzzy logic, models, approximate reasoning, and evolutionary algorithms such as genetic algorithms and simulated annealing. The soft computing allows to incorporate human knowledge effectively, deal with uncertainty, imprecision, and learns to adapt to unknown or changing environment for better performance. As soft computing does not perform much symbolic manipulations, we can therefore view it as a new discipline that complements conventional artificial intelligence (AI) approaches, and vice-versa.
Machine Learning is a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves.
Two important breakthroughs led to the emergence of Machine Learning as the vehicle which is driving AI development forward with the speed it currently has.
- One of these was the realization – credited to Arthur Samuel in 1959– that rather than teaching computers everything they need to know about the world and how to carry out tasks, it might be possible to teach them to learn for themselves.
- The second, more recently, was the emergence of the internet, and the huge increase in the amount of digital information being generated, stored, and made available for analysis.
Once these innovations were in place, engineers realized that rather than teaching computers and machines how to do everything, it would be far more efficient to code them to think like human beings, and then plug them into the internet to give them access to all of the information in the world.
Deep learning is a subset of AI and machine learning that uses multi-layered artificial neural networks to deliver state-of-the-art accuracy in tasks such as object detection, speech recognition, language translation and others.
Deep learning differs from traditional machine learning techniques in that they can automatically learn representations from data such as images, video or text, without introducing hand-coded rules or human domain knowledge. Their highly flexible architectures can learn directly from raw data and can increase their predictive accuracy when provided with more data.
Statistics and Probability
50 years ago AI programs were implemented using logic where each proposition and inference was either true or false. However the world is not so black and white and hence there was the need to represent uncertainty and draw inferences from partial information. This has reduced but not eliminated the issue of “brittleness” in AI systems. The use of probability has now become even more popular because of the success of machine learning technology. Systems based on Bayes Theorem are the most prevalent followed by neural networks.
When there is not enough perfect information, then there will be variables and a margin of error. The role of probability is to reduce uncertainty despite this. Statements about the future, for instance, often beg for more data.
Social Media Analysis in Computer Science Research
Social media analytics (SMA) refers to the approach of collecting data from social media sites and blogs and evaluating that data to make business decisions. This process goes beyond the usual monitoring or a basic analysis of retweets or “likes” to develop an in-depth idea of the social consumer.
Online social networks, such as Facebook and Twitter, have become increasingly popular over the last few years. People use social networks to stay in touch with family, chat with friends, and share news. The users of a social network build, over time, connections with their friends, colleagues, and, in general, people they consider interesting or trustworthy. These connections form a social graph that controls how information spreads in the social network. Typically, users receive messages published by the users they are connected to, in the form of wall posts, tweets, or status updates.
Identifying Compromised Accounts
A compromised account is an existing, legitimate account that has been taken over by an attacker. Accounts can be compromised in a number of ways, for example, by exploiting a cross-site scripting vulnerability or by using a phishing scam to steal the user’s login credentials. Also, bots have been increasingly used to harvest login information for social networking sites on infected hosts.
To address the growing problem of malicious activity on social networks, researchers have started to propose different detection and mitigation approaches. Initial work has focused on the detection of fake accounts (i.e., automatically created accounts with the sole purpose of spreading malicious content).
Traditional anomaly detection on social media mostly focuses on individual point anomalies while anomalous phenomena usually occur in groups. Therefore it is valuable to study the collective behaviour of individuals and detect group anomalies. Existing group anomaly detection approaches rely on the assumption that the groups are known which can hardly be true in real world social media applications.
Trends and Event Detection
Trending topics in social streams are defined as sets of words that frequently appear in a discussion that occur often in response to recent real-world events. A set of words or phrases that are tagged at a greater rate than other sets is said to be a “trending topic.” Trending topics are becoming popular either through a concerted effort by users, or because of an event that prompts people to talk about a specific topic. These trends help users to understand what is happening in the world in real time. Furthermore, marketers and companies use trend detection tools to discover emerging trends and capture the popularity of products and campaigns or design new marketing strategies based on the extracted trends.
The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.
Sentiment analysis is extremely useful in social media monitoring as it allows us to gain an overview of the wider public opinion behind certain topics. The ability to extract insights from social data is a practice that is being widely adopted by organisations across the world. It can also be an essential part of your market research and customer service approach. Not only can you see what people think of your own products or services, you can see what they think about your competitors too. The overall customer experience of your users can be revealed quickly with sentiment analysis, but it can get far more granular too.
People increasingly use social media to get first-hand news and information. Using credible information is a prerequisite for accurate analysis utilizing social media data. Non-credible data will lead to inaccurate analysis, decision making and predictions. Credibility is defined as “the quality of being trustworthy”. In communication research, information credibility has three parts, message credibility, source credibility, and media credibility. Comparing conventional media, assessing information credibility in social media is the more challenging problem. The credibility analysis is one of the strongest field of computer science research which is highly discussed over traditional media recently, specially in India. There are many social media security issues and government is looking forward for decent computer science research and researchers to overcome this problem of credibility in society.
Speech recognition in Computer Science Research
Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Rudimentary speech recognition software has a limited vocabulary of words and phrases, and it may only identify these if they are spoken very clearly.
Text to Speech Conversion
Text to speech, abbreviated as TTS, is a form of speech synthesis that converts text into spoken voice output. Text to speech systems were first developed to aid the visually impaired by offering a computer-generated spoken voice that would “read” text to the user. During database creation, all recorded speech is segmented into some or all of the following: diaphones, syllables, morphemes, words, phrases, and sentences. To reproduce words from a text, the TTS system begins by carrying out a sophisticated linguistic analysis that transposes written text into phonetic text.
Speech to Text Conversion
Speech to text conversion is the process of converting spoken words into written texts. … The term voice recognition should be avoided as it is often associated to the process of identifying a person from their voice, i.e. speaker recognition. voice recognition is a computer software program or hardware device with the ability to decode the human voice. Voice recognition is commonly used to operate a device, perform commands, or write without having to use a keyboard, mouse, or press any buttons.
Sound is a vibration that propagates as a typically audible mechanical wave of pressure and displacement through a medium.
Common uses of identifying sounds are :
- Event Detection
- Song Recognition
- Noise Cancellation
- Voice Recognition
- Environmental Condition Detection
- Mapping Music Composition
Sound based Event Detection
Automatic sound event detection (also called acoustic event detection) is one of the emerging topics of CASA research. Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene. Sound event detection can be utilized in a variety of applications, including context-based indexing and retrieval in multimedia databases, unobtrusive monitoring in health care, and surveillance. Furthermore, the detected events can be used as mid-level-representation in other research areas, e.g. audio context recognition, automatic tagging, and audio segmentation.
Video Processing in Computer Science Research
In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing technology has revolutionized the world of multimedia with products such as Digital Versatile Disk (DVD), the Digital Satellite System (DSS), high definition television (HDTV), digital still and video cameras. The different areas of video processing includes
- Video Compression
- Video Indexing
- Video Segmentation
- Video tracking etc.
With the emergence of video content as an effective mode of information propagation, automating the process of summarization a video has become paramount. Video Summarization, in recent times, has emerged as a challenging problem in the field of machine learning, which aims at automatically evaluating the content of a video, and generating a summary with the most relevant content of the video. Video summarization finds applications in generating highlights for sports events, trailers for movies and in general shortening video to the most relevant subsequence’s, allowing humans to browse large repository of videos efficiently.
Information Extraction from Videos
The ability to extract names of organizations, people, locations, dates and times (i.e. “named entities”) is essential for correlating occurrences of important facts, events, and other metadata in the video library, and is central to production of information collages. Our techniques extract named entities from the output of speech recognition systems and OCR applied to the video stream, integrating across modalities to achieve better results. Current approaches have significant shortcomings. Most methods are either rule-based [Maybury96, Mani97], or require significant amounts of manually labelled training data to achieve a reasonable level of performance [BBN98]. The methods may identify a name, company, or location, but this is only a small part of the information that should be extracted; we would like to know that a particular person is a politician and that a location is a vacation resort.
To detect video forgery, one may think of applying an image forgery detection method to each frame of a given video sequence. Forgery detection techniques for a video are classified into two types:
- Inter-video approaches
- Intra-video approaches
As mentioned earlier, detection of replacement and duplication in videos has been studied by Wang and Farid
- They have also developed an inconsistency based detection method that checks the consistency of de-interlacing parameters used to convert an interlaced video into a no interlaced form
- Since interlaced videos have half the vertical resolution of the original video, the DE interlacing process fully exploits insertion, duplication, and interpolation of frames to create a full-resolution video.
- They also suggested that the motion between fields of a frame is closely related across fields in interlaced videos. Evaluating the interference to this relationship caused by tampering allows their system to detect forgeries in an interlaced video.
- If a region is imprinted by another region in the same video, the correlation between the regions takes an unnaturally high value.
- In contrast, noise residuals of the synthesized textured region from another video exhibit low coherence with the noise residual of other regions.
Image Processing in Computer Science Research
Image processing is a method to convert an image into digital form and perform some operations on it, in order to get an enhanced image or to some useful information from it. It is a type of signal dispensation in which input is image, like video frame or photograph and output may be image or characteristics associated with that image. Usually Image Processing system includes treating images as two dimensional signals while applying already set signal processing methods to them.
It is among rapidly growing technologies today, with its applications in various aspects of a business. Image Processing forms core research area within engineering and computer science research disciplines too.
In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as super-pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyse. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a region are similar with respect to some characteristic or computed property, such as colour, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s). When applied to a stack of images, typical in medical imaging, the resulting contours after image segmentation can be used to create 3D reconstructions with the help of interpolation algorithms like marching cubes.
Object detection in computer vision is an important computer science research topic. Object detection is the process of finding instances of real-world objects such as faces, bicycles, and buildings in images or videos. Object detection algorithms typically use extracted features and learning algorithms to recognize instances of an object category.
The presence of an object can be detected with proximity sensors, and there are different kind of sensor technologies including ultrasonic sensors, capacitive, photoelectric, inductive, or magnetic. Tracking objects can work using proximity sensors.
Topic Detection from Image
In natural images, some documents are embedded and many more objects are shown through which one can identify topic. Text detection is identification of text from natural images. Basic digital image processing techniques are used to detect text from images. This includes pre-processing, extraction or text localization, classification and character detection. The different classification methods used are
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Topic modelling is a frequently used text-mining tool for discovery of hidden semantic structures in a document.
Large amount of data is collected every day. As more information becomes available, it becomes difficult to access what we are looking for. So, we need tools and techniques to organize, search and understand vast quantities of information.
Topic modelling provides us with methods to organize, understand and summarize large collections of textual information. It helps in:
- Discovering hidden topical patterns that are present across the collection
- Annotating documents according to these topics
- Using these annotations to organize, search and summarize texts
Topic modelling can be described as a method for finding a group of words (i.e. topic) from a collection of documents that best represents the information in the collection. It can also be thought of as a form of text mining – a way to obtain recurring patterns of words in textual material.