Information Exchange Portal Authors

Information Exchange Portal

Authors: Dheshan M Aryan Jain
VIT University VIT University

Abstract : This project, discusses in detail, the information required to host an Information Exchange
portal, which will be hosting information provided by the user, and supplying the same to other users.
This information sharing portal will also feature the AES Encryption standard to encrypt user data, and
the user -generated content to protect the user’s private information, and the P ortal’s IP (Intellectual
Property) and will also discuss ways and counter measures to prevent user abuse .

Keywords: Huffman Encoding, AES Encryption, Compression, Abuse Prevention Systems.

I. INTRODUCTION

In modern times, where security breaches have now become common talk, the need to further conceal data
in sophisticate d ways arises. In this project, we will be tackling the problems faced in storing encrypted
data, and the necessary steps that needs to be taken to ensure that the key remains secret and hidden from
prying eyes.
In any service -based application, it become s a necessity to protect the system from intentional abuse
and exploits from users. Since our project relies on user input to function, we will also be looking at how
to filter and vet content that is genuine and content that was just randomly typed to ‘ga me’ the system into
getting more ‘points / Tokens’

II. PROBLEM DESCRIPTION

In our Proposed model, we will be creating a Database that holds the following information:
• User Information
o userID, email Address, Name and other relevant details.
o The Number of virtual tokens he has.
• Information Portal
o Content the user wants to share.
o The Date, this information was posted on.
o userID of the user who posted it.
o Tags to help Classify the Information the user has provided.
The way our model operates is that any perso n can sign up for our service by providing his details, for
which he will receive a few initial starting credits (Virtual Tokens). From there, he can use his credits to
buy information which other people have posted for a fixed number of tokens (as decided by the system,
based on the character count of the information) or post his Information and add it to our Database and
receive more Tokens for it.

The Number of Tokens that will be deducted or earned from purchasing or posting information on the
Portal w ill be determined by an Algorithm that checks to ensure the quality of the post and to prevent
abuse of the system by typing random characters to increase character count and gain fore credits.
The sensitive information from users (i.e., email addresses, m obile numbers) and the IP of the portal
(User submitted Information) need to be secure to ensure the systems’ integrity and the privacy of its users.
To accomplish this, all sensitive content will be encrypted using the AES algorithm.
The number of Tokens to be offered upon a successful post Submission will be evaluated after the
contents of the post are fed to a Huffman Encoding Algorithm , which will convert all the contents to its
binary representation, and based on its Huffman Encoded length and the numb er of words in the post, the
Tokens to be awarded are calculated.

III. LITERATURE REVIEW

A. Need for Encryption
Encryption is needed in our project to accomplish the following goals as stated by 1. Encryption in its
very basic forms helps establish the followi ng:
1. Confidentiality :
To ensure that only the intended recipient gets to read the information transmitted 1.

2. Authenticity /Authentication :
To prove that the person who claims to have written the letter is the one who actually wrote it and
transmitted the data. It helps establish and verify the identity of the sender 1.

3. Data Integrity :
To check if the data has been modified either intentionally or accidentally by any third party without
the explicit authorization of the original sender 1. This can be ensured by using hashing systems
on both ends of the communication to confirm the integrity of the data.

4. Non -Repudiation :
To prove that message was in fact written and transmitted by the sender and was received by its
intended recipient. It helps ascertain that the message was indeed sent by the sender 1.

5. Access Control:
It allows enforcing a set of policies that limit the powers and privileges of a user or a group of users
so that they have access only to the files that they need acc ess to. This helps prevent unauthorized
use of the system 1.

B. Types of Encryption
Based on the number of keys that can be used to decrypt the data, the encryption algorithms can be
divided/classified into the following two categories:
1. Symmetric key Cryptography
2. Asymmetric key Cryptography
The symmetric key system is the most straightforward of the two types of encryption systems. It uses a
single key to both encrypt and decrypt the data 1 . The same simplicity also leads to a very spec ific
problem, regarding the transportation of the key as the key also has to be known by the recipient to decode
the data, and in the event that the key is compromised, the whole of the encrypted data too gets compromised
as it can be decrypted using the s ame key 1 .
This problem is solved by the use of Asymmetric key systems as they use different keys for
decrypting and encrypting. This means that there is no need for any transportation of keys, as it allows both
the sender and the recipient to have a di fferent/unique set of keys 1 .

C. Why AES over other Encryption Techniques ?
Advanced Encryption Standard ( Rijndael ) also known as AES is an algorithm developed by Belgian
cryptographers, Vincent Rijmen and Joan Daemen. It was chosen as the AES by the Unite d States Secretary
of Commerce despite the fact that the cipher was accessible to the public. It was also approved by the NSA
for use in encrypting its confidential data 123 .
Its wide spread approval to be chosen as the AES was very much due to its performance efficiency
and the time required to break it/Brute -force it. It was estimated that breaking a symmetric 256 -bit key by
brute force requires 2128 times more computational power than a 128 -bit key 1 . Fifty supercomputers
that could check a bi llion billion (10 18) AES keys per second (if such a device could ever be made) would,
in theory, require about 3×10 51 years to exhaust the 256 -bit key space. This made it practically impossible
for anyone to brute force this algorithm, making it the most s ecure option among the Symmetrical
encryption key algorithms 1 .

D. How AES works?
AES works on a principle known as a substitution permutation network and is highly efficient . For
the data to be encrypted using AES, the data is made into blocks of fixed sizes (128, 192 or 265 bit each
block depending on the key) 3 . The data is then represented in the form of a two -dimensional matrix and
various operations are performed on the blocks for fixed number of rounds 2 .
The number of rounds the operat ions are performed is dependent on the key chosen:
• 10 rounds for 128 -bit keys.
• 12 rounds for 192 -bit keys.
• 14 rounds for 256 -bit keys.

Operations Performed
1. KeyExpansion — round keys are derived from the cipher key using Rijndael’s key schedule. AES
requires a separate 128 -bit round key block for each round plus one more 23 .
2. Initial round key addition:
1. AddRoundKey — each byte of the state is combined with a block of the round key using
bitwise XOR 3 .
3. Repeated for 9, 11 or 13 rounds
1. SubBytes — a no n-linear substitution step where each byte is replaced with another
according to a lookup table 3 .
2. ShiftRows — a transposition step where the last three rows of the state are shifted cyclically
a certain number of steps 3 .
3. MixColumns — a linear mixing oper ation which operates on the columns of the state,
combining the four bytes in each column 3 .
4. AddRoundKey
4. Final round (making 10, 12 or 14 rounds in total)
1. SubBytes
2. ShiftRows
3. AddRoundKey

IV. PROPOSED M ETHOD

The issues faced by a content delivery based and user -generated -content driven services are,
1. Data Security
2. User Abuse
In order to counter the above -mentioned issues, the following counter measures can be employed.

A. Sophisticated Encryption Techniques

In common practice, all enterprises (small – medi um sized) use centralized encryption servers meaning that
all data has to be hauled to -and -from the server to be encrypted meaning that the encrypted data has to
travel longer distances, all the while being vul nerable to attacks or sniffing. This can be dangerous when
dealing with more sensitive data as even internal networks can indeed be invaded, and all data transfer lines
need to be secured to ensure confidentiality.

________________________________________________
Current Issue (as showcased in Fig 1), where there are
open, encrypted data channels within the network. These
unprotected channels can be tracked of sniffed out by any
packet sniffing tools which will in tur n reveal any data sent
through the unsafe channel.
Figure 1

This problem can be relatively subverted by breaking the centralization of encryption system and
instead opting for a decentralized local encryption server. One that resides in the client system and encr ypts
all outgoing data to the main server. The main server and the clients’ encryption server can be configured
to use the same keys so that both the client and the server can read data freely without the need to transfer
keys, completely elimination the n eed for a dedicated encryption server at the cost of a small overhead.

The next issue, when it comes to encryption is the method used in storing keys. Storing keys along
with the same database can be dangerous, as when the system gets compromised, the ke y also gets
compromised. Ideally, this can be prevented by using separate machines to store the data and the key, but,
since this can’t be implemented in our project due to lack of technical knowledge, we decided in favor of
adding another layer of securit y by encrypting the encryption keys using a different key that the user doesn’t
have access to. This can be done in the back -end not adding any overhead to the user, albeit the server will
suffer a bit more overhead.

B. Prventing User Abuse

Our project relies on giving out tokens to users for their fair contribution by submitting their ideas or any
information. The number of tokens awarded is in -turn calculated programmatically based on the number of characters
submitted. But this method can be heavily abused by exploitative users, who can just submit 1000 characters of random
letters and get the same number of Tokens a genuine user who has given a significant amount of effort to produce a
quality post worth 1000 characters.

Ideally, this problem can be solved by using compression techniques to reduce the size of the input, and ideally,
the dictionary size used to compare and compress should be as large as possible. Since compression is both IO and
processor intensive, our project will be u tilizing a fair compromise in the form of using ‘Huffman Encoding’. That is
the value of the string will not be based on the length of the string, but rather the encoded length of the string.

V. RESULTS AND ANALYSIS

Let’s put the proposed method to combat User abuse prevention to the test. We will consider two
Strings of identical sizes as our test input. Of the two, one will be the control i.e., in this context, will be a
proper post written by a user with words and spaces resembling a typical English pass age and the other
input will be random keystrokes, this test case is representative of the people who are aiming to game the
system for Tokens by trying to artificially inflate Character and word -count expecting a greater Reward.
We will then be comparing the number of Tokens each Algorithm awards for the user.
Test Case 1
abbsfasabafnadfn dndnanf ndffnddnsd fndfsfbdfdrbsrnaer abbsfas abafnadfn dndnanf ndffnddnsdfndf sfbdfdrbsrnaer abbsf asabafna dfn dndnanf ndffnddnsdfnd fsfbdfdrbsrnaer abbsfasabafnadfn dnd nanf ndffndd nsdfndfsfbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddnsd fndfsfbdfdrbsrnaer abbsfasa bafnadfn dndnanf ndffnddns dfndfsfbdfdrbsrnaer abbsfasab afnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddn sdfndfsfbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer abbsfas abafnadfn dndnanf ndffnd dnsdf ndfsfbdfdrbsrnaer abbsfasa bafnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer
Test Case 2 (Control)
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but als o the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, a nd more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. Th is is a control.

Based on the above posts, it is clear that the Control post must get more Tokens when compared with the
test post, which obviously looks like a troll post made with the sole intent to milk the system for its tokens
without providing any valid information to the user. Therefore, the goal must be to determine such ‘troll’
post and prevent issuing more tokens to such posts and awarding more tokens to the Control post in this
case.
Without using Huffman coding, the Tokens to be awarded is ca lculated based on the Formula:
# of Tokens = length of post / # of words (Calculated based on # of blank spaces)

When using Huffman encoding, the formula morphs to:
# of Tokens = length of (huffmanEncoding(post)) / # of words (Calculated based on # of b lank
spaces)

Word Value Algorithm Word Count Algorithm Using Huffman Encoding Without Huffman
Encoding
Test 11.22 95 2 3
Control 5.063 50 3 3
Table 1 Consolidated results of the proposed method against other Algorithms.
Here, the Word value algorithm finds the worth of each word with respect only its length, therefore,
the whole algorithm can be simplified to: # of characters / # of words. The Word co unt algorithm as the
name suggests, used only the # of words as its factor in determining its end result.

Figure 2. Algorithms v. Tokens Awarded
On observing the Table and as clearly visualized by Figure 2 , it is clear that the Proposed method
of using Huffman Encoding is optimal as it greatly reduces the Tokens awarded and can be modified and
has flexibility (by multiplying by constants to round off numbers etc.) while other algorithms are rigid and
can’t be changed much.

0 20 40 60 80 100
Word Value
Word Count
Using Huffman
Without Huffman
Algorithm Viability Comparison
Control Test

REFERENCES
1. Kumari, S. (2017). A research Paper on Cryptography Encryption and Compression Techniques. International Journal
Of Engineering And Computer Science. https://doi.org/10.18535/ijecs/v6i4.20
2. Daemen, J oan; Rijmen, Vincent (March 9, 2003). “AES Proposal: Rijndael”. National Institute of Standards and
Technology.
3. Wikipedia. Advanced Encryption Standard. https://en.wikipedia.org/wik i/Advanced_Encryption_Standard