A Novel Method for Data Hosting and Load Balancing in Multi Cloud Environment

In recent yearsthere is a rapid movement of people towards online data hosting services. Manycloud service providers are offering such services. Data hosting is to storedata on a server or other computer so that it can be accessed over theinternet. Sometimes companies required particular resources for limited periodof time then they need not to purchase those resources. Cloud storage can be understood asa  service model in which data ismaintained, managed, backed up, remotely and madeavailable to users over a network (typically the Internet).

Companies can use resources over a network onpay per use basis.Cloud computing providesdifferent types of services to the users over the network. It enables companiesto consume resources as a utility just like electricity. Data hosting servicesprovide users with a efficient and reliable way to store data and this storeddata can be accessed from anywhere, on any device, and at any time. Cloud computingis internet based computing which provides on demand access to shared pool ofresources and data on pay per use basis. Cloud computing provides distributedenvironment which is essential to develop large scale applications rapidly.There arethree main cloud-based storage architecture models: ·       Public·       Private·       Hybrid.

 Public Cloud storage model provides a multi tenet storage environment that is mostsuited for data which isunstructured. In this architecture data is stored in global data centers andstored data distributed across multiple regions. Private Cloud storage model provides adedicated environment the data is protected behind an organization’s firewall. Privateclouds are appropriate for users who need more security to the data and morecontrol over their data.Hybrid Cloud is a combination of private cloudand third-party public cloud services.

The model offers flexibility and moredata deployment options in cloud. In recent days, more number of customers hasadopted the hybrid cloud model.            In recent years data hostingservices became more popular so that there are many cloud service providersoffering data hosting services. In most of the cases companies moving towardshosting their data into a single cloud. However in market there are severaloptions became available from various cloud vendorsHeterogenous clouds:             There are various cloud vendors exhibitingvariations in working performances and pricing policies. They design withdifferent system architectures and apply various techniques to provide betterservices. So that customers  are unableto understand which clouds are suitable to host their data.

This is calledvendor lock in risk. It is inefficient for an organization to host all the datain   a single cloud. It does not provideguaranteed availabilityMulti Cloud data hosting:             Multi Cloud data hosting is to distributeacross multiple clouds to gain more availability of the data and to minimizethe risk of data loss or system failure due to a centralized component failurein a cloud computing environment.

Such a failure can occur in hardware,software, or infrastructure. Such a strategy also improves the overallenterprise performance by avoiding potential risks such as “vendorlock-in”.          SYSTEM STUDY & ANALYSIS    EXISTING SYSTEMIn existing cloud data hosting systems, availabilityof data are usually guaranteed by replication or erasure coding. In themulti-cloud environment we also use the above two mechanisms to achieve distinctavailability requirements, but both of them require different implementations.  Replication is achieved by using redundancy,replicas are placed in several clouds, to read data it accesses   the “cheapest”cloud that charges minimal out-going bandwidth and GET operation unless it isunavailable.  Data replication issuitable for systems with distributed applications. For erasure coding, thereare m data blocks and data is encoded into n blocks. m data blocks and n-mcoding blocks are placed into n different clouds.

In this case, compared withreplication data availability is guaranteed with lower storage space, to readdata multiple clouds need to be accessed which are storing the correspondingdata blocks. However erasure coding read access is not served by the cheapestcloud as replication. In the multi-cloud scenario bandwidth is generally (much)more expensive than storage space. In the multi-cloud scenario thereplication techniques and the erasure coding mechanisms are used to meetdifferent availability requirements, but the implementation of these are verydifferent. The two problems related to multi cloud are·       How to chooseappropriate clouds in the presence of heterogeneous pricing policies whichprovides minimum monetary cost.

·       How to meetdifferent cloud availability requirements of different hosting services. PROBLEM STATEMENT Ø  Tohost data in multi-cloud people  encounter the two critical problems:Ø  How tochoose appropriate clouds in the presence of heterogeneous pricing policies tominimize monetary cost.Ø  How toachieve different availability requirements to provide different services? Ø  Monetarycost mainly depends on the usage of data, particularly amount of storagecapacity consumption and amount of network bandwidth consumption. Ø For availability requirement, considerationis which redundancy mechanism (i.e., replication or erasure coding) is moreeconomical based on specific data access patterns.  Ø How to balance the load  when multiple clouds are active.

Ø Howto identifying the best data centre for hosting based on the given input. Theselection is based on the current resources allocated, size of the data centreand input file size and load on the centre. PROPOSEDSYSTEM            We propose a novel method forcost-efficient data hosting scheme with high availability in heterogeneousmulti-cloud based on a predictor model. It intelligently puts data intomultiple clouds with minimized monetary cost and guaranteed availability.Specifically, we combine the two widely used redundancy mechanisms, i.

e.,replication and erasure coding, into a uniform model to meet the requiredavailability in the presence of different data access patterns. Next, we designan efficient Predictor algorithm to choose proper data storage modes involvingboth clouds and redundancy mechanisms (ERREPLCA).            In existingsystem the major focus is combining the replication and erasure methods theydon’t provide a specific method for predictor. However there are manyprediction algorithms exists such as weighted moving average method. Somemethods use building a classifier to predict the access frequency of files.

Inthe proposed method we build a predictor using data mining algorithms. Sincemany of the data centers generate enormous log files the size of input is hugewe need an algorithm to handle such data. Advantages:1.     Selects the best cloud for data hosting to balance theload which is cost effective.2.     Uses Replication mechanism for high availability.

3.     Handles bulk amount of log information and quickly identifiesthe best cloud, which is suitable for cloud environment.4.     Uses an Efficient predictor to decide the storage modeand a suitable cloud data center.

5.     Saves monetary costs.We use a split algorithm for predicting a data centre with fewerloads for next allocation. Once a Data Centre is allocated later based onstatistics given by predictor we apply ERREPLICA method.

Stage1 // PredictorMakeTree(Training Data T) Partition(T) Partition(Data S) if(all points in S are in the same class) then return; Evaluate Splits foreach attribute A; Use best split topartition S into S1 and S2; Partition(S1);Partition(S2); or each attribute A do traverse attribute listof A for each value v in the attribute list dofind the correspondingentry in the class list, and hence thecorresponding class and the leaf node lupdate the classhistogram in the leaf l if A is a numeric attribute then compute splitting indexfor test (A ? v) for l if A is a categorical attribute then for each leaf of the tree do find subset of A withbest split  // Stage2 Distribute// choosing a Storage Mode and allocating toa best Data Centre.The Algorithm            Setup (n datacenters)            Alloc(m)          //Allocate m blocks to each dc            Compute Load for each Data Center.            Choose a datacenter based on load            For k=1 to n                        Check the availabilityof kth dc suitable for µ             Ifµ = sflag                        Allocateto K            Else                        Ealloc (n, µ)            End //Algorithm for partitioning and choosing a suitable cloud with least cost.            Ealloc (n,µ)            //The output is minimum cost C, Theset of the selected clouds H.            1.Cßinf;            2.H={}             //initially empty.

            3.Sort the clouds by S+ µ //Accessibility             4.  for m= 1 to n do            Aßcalculate the availabilityof G            If A<=Amax then                        Mcostßminimamalcost.            If Mcost

0. Results aretested using VMWare player 12.5.7 on a 64 bit machine .

The systems is tested for 10 Data centers which are set usingsimulator. In this paper we proposed a method for datahosting in cloud environment. This method uses a predictor which determines thesuitable data centre for cost efficient allocation. The predictor uses Data mining split algorithm for prediction. Wealso used erasure and replication code for secure storage.



I'm Tamara!

Would you like to get a custom essay? How about receiving a customized one?

Check it out