NUST EME: Extraction and Evaluation of Cyber Social Structures

PROJECT OVERVIEW

Abstract

When a computer network connects people or organizations, it is a social network. Yet the study of such computer-supported social networks has not received as much attention as studies of human-computer interaction, online person-to-person interaction, and computer-supported communication within small groups. I here argue the usefulness of a social network approach for the study of computer-mediated communication via e-mails. I review some basic concepts of social network analysis, describe how to collect and analyze social network data, and demonstrate where social network data can be, and have been, used to study computer-mediated communication. Throughout, I show the utility of the social network approach for studying computer-mediated communication, be it in computer-supported cooperative work, in virtual community, or in more diffuse interactions over less bounded systems such as the Internet (i.e. Yahoo, Hotmail etc).

Problem Definition

To analyze the human social-networks at their very precision, availability of correct and sufficient amount of data regarding human interactions is critical. With recent developed networks, having data rates reaching 10/100 Giga, it is very desirable to have such architectures that can collect data at such rates. The unpredictable statistics of the networks make it quite difficult to have hands on the right amount of data. The common characteristics present within the networks are out-of-ordering and simultaneous multi-million connections at TCP layer. In such a scenario, if one tries to re-assemble the packets, the memory consumption will increase infinitely. For example if we assume that we have a network having TCP connections about 3 millions and one connection on the average takes 1 MB, then the memory required would be in giga or tera bytes which is undesirable.

The next problem that hinders the process of evaluation is the capturing of packets. Again if we talk about of networks with 10/100 Gigabit rates, which are the normal occurring rates, the standard 10/100/1000 LAN/Ethernet cards will simply fail. These cards work on the interrupt basis, interrupting the OS whenever a new packet comes. Now if packets are coming at 10/100 Gigabit rate the card will be consumed by the interrupts and will thus start dropping the packets.

Another problem that poses a major conflict is the separation of the targeted data into its specified domains (i.e. hotmail, yahoo etc). This stage is required so that the load on the filtering machine can be lessened. This parsing of the domains is again of main concern.

Now even if proper data has been made available the analysis of that data, mostly in GBs, requires a lot of processing. Detection and identification of duplicate personalities and there mapping on to single one with the desire connections require a very strong and flexible design for its completion.

Research Work

Main research work for the solution of this problem is carried out in following fields:

1. FPGA based Data Collector on Gigabit rate networks.

2. Hardware accelerator for Protocol Domain Parsing.

3. Hardware accelerator for Partial String Matching.

4. Software solution for parsing E-Mail tags.

5. A complete database design for information storage.

6. A software solution for socio-network communication analysis.

7. A software solution for packet sequencing at Gigabit rate networks.

8. A software solution for G-zip decryption at Gigabit rate networks.

9. Manual protocol implementation on TCP layer with, similar to, Go-Back-N retransmission system.