A Public IaaS Cloud Simulator

View project on GitHub

What is PICS?

PICS is a Public IaaS Cloud Simulator and is designed for evaluating the performance of both public IaaS clouds and cloud applications without actual deployment of cloud applications.

  • Main Capabilities:
    1. Assessing a wide range of properties of cloud services and cloud application, including the cloud cost, job response time, and resource utilizations.
    2. Allowing PICS users to specify different workloads types, including dynamic job arrival patterns and SLA requirements (e.g. Job Deadline).
    3. Simulating of a broad range of resource management policies: i.e., horizontal and vertical auto scaling, custom job scheduling policies, and job failure cases.
    4. Enabling PICS users to evaluate the performance of different types of current public IaaS cloud configurations such as a variety of resource types (VM instances and storage services), unique billing models, and performance uncertainty.

PICS Overview

PICS Architecture
  • Design Goal
  • The goal of PICS is to precisely simulate the behaviors of public IaaS clouds the cloud users' perspectives as if they deply a particular cloud application on public IaaS clouds.

    Design Challenges:
    • Precise modeling of the behavior of public cloud providers (including a variety of cloud resources. e.g., VM, storage, and network).
    • Precise modeling of the behavior of public cloud application (including dynamic workload changes and performance uncertainty).
    • Precise modeling of the behavior of cloud users' resource management policies.
  • Simulation Inputs (PICS Configurations)
  • PICS inputs consist of three configurations files.
    • General PICS configurations (./config/config.txt)
      • Simulator Configurations
        • Public IaaS Configurations (Billing, VM, Storage, and Network)
        • Job Management Configurations (Scheduling, Failure Management)
        • VM Management Configurations (VM Selection, Scaling)
    • VM configurations (./config/vm_config.txt)
      • VM Type and Price
      • VM CPU and Network Performance
    • Workload configurations (./workload/workload.csv)
      • Job Arrival, Deadline, Duration, Data Usage Information

    read more PICS descriptions...

    General PICS Configurations Detail (./config/config.txt)
    • Simulator Configurations
    • # A.  Simulation Configuration
      # A.1 Sim Trace Interval (e.g. every 60 sec)
      # A.2 Workload File Path
      # A.3 VM Configuration Path
    • Public IaaS Configurations
    • # B.  Public IaaS Configuration
      # B.1 VM Billing Config - Set IaaS Pricing Model (Hour or Min-based)
      # BTU_HOUR: Hourly Billing Model   (e.g. Amazon Web Services)
      # BTU_MIN : Minutely Billing Model (e.g. MS Azure & Google Compute Engine)
      # B.1 VM Billing Conf - Set Billing Time Period (Int and > 0)
      # --> 30min based billing
      # Note that Unit Cost($) for each VM is defined at VM_CONFIG_FILE
      # B.2 VM Startup Delay (Lagtime) - Min Startup LagTime for Creating a new VM
      # B.2 VM Startup Delay (Lagtime) - Max Startup LagTime for Creating a new VM
      # PICS determines startup lagtime for creating a new VM
      #      from MIN_STARTUP_LAG (>=0) to MAX_STARTUP_LAG (>=0)
      # C.  Cloud Storage Configurations
      # C.1 Max Volumn of Cloud Storage
      #     Unit: Mega Bytes
      # C.2 Storage Usage Cost ($) for Gigabytes/Month
      #     STORAGE_UNIT_COST > 0
      # C.3 Storage Billing Time Unit (Second, > 0)
      #     1      : 1sec
      #     60     : 1 min
      #     3600   : 1 Hour
      #     86400  : 1 Day
      #     2592000: 1 Month
      # D.  Network Configuration
      # D.1 Network Bandwidth for Data Transfer (Unit: MB/s)
      # D.1.1 Bandwidth from Cloud to Cloud
      # D.1.2 Bandwidth from Incoming Traffic
      # D.1.3 Bandwidth for Outgoing Traffic
      # D.2 Network Cost for Data Transfer (Unit:$)
      # D.2.1 Network Cost from Cloud to Cloud
      # D.2.2 Network Cost for Incmoing Traffic
      # D.2.3 Network Cost for Outgoing Traffic
    • Job Management Configurations
    • # E.  Job Management Configurations
      # E.1 Job Scheduling Configuration
      # E.2 Job Failure Configurations
      # E.2.1 Probability for Job Failure Occurance
      #       0 <= PROB_JOB_FAILURE <= 1
      #       e.g. 0.05: 5%
      # E.3 Job Failure Recovery Policy
      #     JF-POLICY-01: ignore the failed job
      #     JF-POLICY-02: re-execute the failed job
      #     JF-POLICY-03: move the failed job to end of the job queue
      #     JF-POLICY-04: find another VM (running or new) to satisfy
      #                   the failed job deadline
    • VM Management Configurations
    • # F.  VM Management Configuration
      # F.1 VM Selection Policy for VM Scaling-up
      #     VM-SEL-COST     : Cost based VM Selection
      #     VM-SEL-PERF     : Performance based VM Selection
      #     VM-SEL-COSTPERF : Cost/Performance Balanced VM Selection
      # F.2 Max Num of Concurrent VMs
      #     > 0 or UNLIMITED
      # F.3 VM Scale Down Policy
      # SD-IM             : Immediate VM Scale-down when the VM is idle
      # SD-HR             : Hourly Billing Model-based Scale-Down (e.g. AWS)
      # SD-MN             : Minutely Billing Model-based Scale-Down (e.g. MS Azure)
      # SD-SL             : Startup-Lag based Scale-Down
      # SD-JAT-MEAN       : Mean Job Arrival Rate-based Scale Down
      # SD-JAT-MAX        : Maximum Job Arrival Rate-based Scale Down
      # SD-JAT-MEAN-RECENT: Mean Recent Job Arrival Rate-based Scale Down (kNN)
      # SD-JAT-MAX-RECENT : Max Recent Job Arrival Rate-based Scale Down
      # SD-JAT-SLR        : Simple Linear Regression (JAT)-based Scale Down
      # SD-JAT-2PR        : Quadratic Regression (JAT)-based Scale Down
      # SD-JAT-3PR        : Qubic Regression (JAT)-based Scale Down
      # SD-JAT-LLR        : Local Linear Regression (JAT)-based Scale Down
      # SD-JAT-L2PR       : Local Quadratic Regression (JAT)-based Scale Down
      # SD-JAT-L3PR       : Local Qubic Regression (JAT)-based Scale Down
      # SD-JAT-WMA        : Weighted Moving Average (JAT)-based Scale Down
      # SD-JAT-ES         : Exponential Smoothong (JAT)-based Scale Down
      # SD-JAT-HWDES      : Holt-Winters Double Exponential Smoothing (JAT)-based
      # SD-JAT-BRDES      : Brown's Double Exponential Smoothing (JAT)-based
      # SD-JAT-AR         : Autoregressive-based Scale Down
      # SD-JAT-ARMA       : Autoregressive and Moving Average-based Scale Down
      # SD-JAT-ARIMA      : Autoregressive Integrated Moving Average-based
      # F.4 VM Scale Down Policy Unit
      #     This configuration is only applicable for
      #     billing model based Scale Down
      #     (e.g. SL-HR and SL-MN)
      #          --> 10 min based Scale Down
      # F.5 Num of Recent Sample for SD Policies
      #     This configuration is applicable for RECENT-based SD policies.
      #     (e.g. SD-JAT-*-RECENT)
      # F.6 First Parameter for Timeseries.
      #     alpha for WMA, ES, HWDES, BRDES: 0 < alpha < 1
      #     p for AR, ARMA, ARIMA (p >= 0)
      # F.7 Second Parameter for Timeseries.
      #     beta for HWDES (0 < beta < 1)
      #     q for ARMA and ARIMA (q >= 0)
      # F.8 Third Parameter for Timeseries.
      #     d for ARIMA (d >= 0)
      # F.9 MIN/MAX for Wait Time of VM Scale Down
      # These MIN/MAX fields are related to predictive methods such as SD-JAT-SLR.
      # To handle wrong prediction results
      # --> too short (or negative) or too long wait time
      # F.10 Vertical Scaling
      #      Vertical Scaling - Enable: YES, Disable: No
      #      When enabling Verticaling, MAX_NUM_OF_CONCURRENT_VMS
      #      shouldn't be UNLIMITED
      # F.11 Vertical Scaling Operation
      # VSCALE-UP   : Only VScale-up
      #               (triggered when VM cannot meet deadline for queued jobs)
      # VSCALE-DOWN : Only VScale-down
      #               (triggered  when VM meets deadline - find most suitable one
      #               for queued jobs (e.g. cheapest VM with deadline satisfaction)
      # VSCALE-BOTH : Both VScale-up/down
      # F.12 Vertical Scaling Options
    VM Configurations Detail (./config/vm_config.txt)
    # Number of VM types used in PICS simulation and n > 0
    # First VM Type Name
    # First VM Unit Price ($)
    # First VM CPU Performance
    # Used to calculate job duration on VM type
    # Less value for CPU factor is better
    # First VM Network Performance
    # Used to calculate data transfer rate on VM type
    # Less value for NET factor is better
    # Second VM Type Name
    # Second VM Unit Price ($)
    # Second VM CPU Performance
    # Second VM Network Performance
    # nth VM Type Name
    # nth VM Unit Price ($)
    # nth VM CPU Performance
    # nth VM Network Performance

    read less PICS descriptions...

    Workload Configurations Detail (./workload/workload.csv)
    • Workload file example
    • #job_submit_interval,job_duration,job_deadline,input_data,output_data
      read more workload descriptions...
    • job_submit_interval: this means job generation interval (unit: PICS simulation clock - second).
    • e.g. 100,200,1000,NONE_0,NONE_0
      First job will be generated at 100 simulation seconds.
      Next job will be generated at 200 seconds.
       ==> (100 seconds after the previous job generation.
    • job_duration: this means standard job duration on VM instance.
    • e.g. 100,200,1000,NONE_0,NONE_0
      Actual job duration on each VM is calculated by
      standard duration * each VM's CPU_FACTOR
      Actual job duration on a VM (CPU_FACTOR=2.0) is 400 (200 * 2.0)
    • job_deadline: this means deadline for job (can used this for SLA).
    • input_data: this means input data to process job. "DATA_TRANSFER_DIRECTION"_"DATA_SIZE (size: mega bytes)"
    • e.g. 100,200,2500,IC_524288,NONE_0
      NONE_0: No input data (size is zero).
      IC_xxx: xxx mega bytes of input data, transfer direction: Cloud => Cloud.
      OC_xxx: xxx mega bytes of input data, transfer direction: Outside => Cloud.
    • output_data: this means output data from job processing. "DATA_TRANSFER_DIRECTION"_"DATA_SIZE (size: mega bytes)"
    • e.g. 100,200,5000,OC_524288,OC_524288
      NONE_0: No output data (size is zero).
      IC_xxx: xxx mega bytes of output data, transfer direction: Cloud => Cloud.
      OC_xxx: xxx mega bytes of output data, transfer direction: Cloud => Outside.

      read less workload descriptions...

  • (Recently Updated!) Simulation Results
  • PICS provides six report files after completing simulation. These resports provide real-time trace and detailed simulation results for worklod processing, resource (VM/Storage/Network) usage/cost.
    (*) You can find report files at "Logs/pics_log-YYYY-MM-DD-hh-mm-ss/Report/"
    (*) In most cases, the following THREE (*) result files are the most important ones:
    • 1.report_simulation_trace_broker.csv
    • 3.report_job_complete_report.csv
    • 4.report_vm_usage_report.csv

    read more simulation resutls...
    • 1.report_simulation_trace_broker.csv provides
      • real time trace for incoming workloads. (e.g. JOB_RECV(CUMM) and JOB_RECV(UNIT))
      • real time trace for workload completion. (e,g, JOB_COMP(CUMM) and JOB_COMP(UNIT))
      • real time trace for VM usage status and cost. (e.g. VM_*)
      • Meaning of all attributes for 1.report_simulation_trace_broker.csv
        1. CLOCK: Simulation clock.
        2. JOB_RECV(CUMM): The accumulated numbers of received jobs until the current simulation clock.
        3. JOB_RECV(UNIT): The number of received jobs at the simulation clock.
        4. JOB_COMP(CUMM): The accumulated numbers of completed jobs until the current simulation clock.
        5. JOB_COMP(UNIT): The number of completed jobs at the simulation clock.
        6. VM_RUN: The number of currently running (VM_STUP + VM_ACT) VMs including currently starting up VMs and active VMs, and not including stopped VMs (VM_STOP).
        7. VM_STUP: The number of currently starting up VMs.
        8. VM_ACT: The number of currently active VMs.
        9. VM_STOP: The number of currently stopped VMs.
        10. VM_COST($): The accumulated VM cost at the simulation clock.
    • 2.report_simulation_trace_iaas.csv provides
      • real time trace for cloud usage and cost.
      • real time trace for network usage and cost.
      • Meaning of all attributes for 2.report_simulation_trace_iaas.csv
        1. CLOCK: Simulation clock.
        2. # SC: The number of storage containers (e.g. S3 buckets).
        3. # SFO: The number of storage file objects (e.g. total # of files in all S3 buckets).
        4. ST_SIZE (KB): The current size (Kilo Bytes) of cloud storage (e.g. S3) at the simulation clock.
        5. ST_COST ($): The current cost of cloud storage (e.g. S3) at the simulation clock.
        6. NET-IN (KB): The amount of data transmission (outside of clouds --> clouds (IaaS data center)) at the simulation clock.
        7. NET-OUT (KB): The amount of data transmission (clouds (IaaS data center)--> outside of clouds) at the simulation clock.
        8. NET-CLOUD (KB): The amount of data transmission (clouds <--> clouds in the same data center) at the simulation clock.
        9. NET-IN_COST ($): The network cost for item #6.
        10. NET-OUT_COST ($): The network cost for item #7.
        11. NET-CLOUD_COST ($): The network cost for item #8.
        12. NET_COST ($): The total cost for network usage: item #9 + item #10 + item #11.
    • 3.report_job_complete_report.csv provides
      • detailed information for each workload processing.
        • CPU time -- e.g., item #5: CPU
        • Network time -- e.g., item #4 and #6: IN/OUT
        • Deadline satisfaction -- e.g., item #15: DF
        • Total duration -- e.g. item #13: TD and item #14: RT
        • Cost for each workload processing -- e.g. item #16: CO($)
        • Meaning of all attributes for 3.report_job_complete_report.csv
          1. ID: Job ID.
          2. JN: Job Name.
          3. ADR: Actual Job Duration.
          4. IN: Network time for data transmission to VM before this job processing.
          5. CPU: CPU time for the job processing.
          6. OUT: Network time for data transmission from VM to outside of VM after this job processing (e.g. output file data transfer).
          7. DL: Job deadline.
          8. VM: Assigned VM ID for this job processing.
          9. TG: Time for job generation. (job ingress time.)
          10. TA: Time for job assignment to the particular VM.
          11. TS: Time for job processing start.
          12. TC: Time for job processing completion.
          13. TD: Total duration for job processing. (TC - TG)
          14. RT: Job runtime. (TC - TS)
          15. DF: Difference from job deadline
            • Positive value: job deadline satisfaction.
            • Negative value: job deadline miss.
          16. CO($): Cost for this job processing.
          17. ST: Job state.
            • JOB_ST_COMPLETED (3004): this job is successfully completed.
            • JOB_ST_FAILED (3005): this job is failed.
    • 4.report_vm_usage_report.csv provides
      • detailed information for VM usage.
        • VM usage cost -- e.g. item #3: CO($)
        • VM running time -- e.g. item #2: RT
        • VM utilization -- e.g. item #13: UT
        • # of processed jobs -- e.g. item #11: NJ
        • Vertical scaling decision -- e.g. item #18, #19, and #20.
        • Meaning of all attributes for 4.report_vm_usage_report.csv
          1. VMID: VM (Virtual Machine) ID.
          2. RT: VM runtime.
          3. CO($): VM Cost.
          4. IID: VM Instance ID.
          5. TY: VM Type. (e.g. m3.xlarge)
          6. ST: VM State.
            • VM_ST_CREATING (3101): This VM is currently creating.
            • VM_ST_ACTIVE (3102): This VM is currently running (active).
            • VM_ST_TERMINATE (3103): This VM is terminated.
          7. TC: Time for VM is created.
          8. TA: Time for VM is activated.
          9. TT: Time for VM is terminated.
          10. SL: Startup lag time for this VM.
          11. NJ: The number of jobs processed by this VM.
          12. JR: Job runtime on this VM.
          13. UT: VM Utilization (e.g. 0.9 = 90% of utilization)
          14. SR: Startup portion of total VM running time.
          15. ID: Idle portion of total VM running time.
          16. LJCT: Simulation clock for the last job completion on this VM.
          17. SDWT: Scale down wait time -- wait time before termination of this VM.
          18. IS_VS_VICTIM: True if this VM is a victim for vertical scaling up. False if this is not eligible for vertical scaling.
          19. VS_CASE: Case for vertical scaling.
          20. VS_VICTIM_ID: -1 if this is not related to vertical scaling. if not -1, this VM is vertical scaling case and this field marks the victim of vertical scaling.
    • 5.report_storage_usage.csv provides
      • detailed information for Cloud Storage usage including storage usage time, cost, and volumn size.
      • Meaning of all attributes for 5.report_storage_usage.csv
        1. If TY is SC: this is information for storage container. (e.g. S3)
          • ID: Storage Container ID. (e.g. S3 ID)
          • CR: Created Job ID for this storage container.
          • TR: An simulation entity that terminates this storage container.
          • RG: Storage container region.
          • PM: Permission for this storage. (e.g. SC_PERMISSION_PUBLIC (4001): public, SC_PERMISSION_PRIVATE (4002): private, SC_PERMISSION_GROUP (4003): group permission)
          • ST: State for the storage container.
          • CT: Time for creation.
          • DT: Time for deletion.
          • DR: Duration for this storage container is active.
          • NF: The number of stored files.
          • VL(KB): The volume for storage container.
          • CO($): Cost for this storage container.
        1. If TY is SFO: this is information for file object in particular storage container.
          • ID: File object ID.
          • SC: Storage container ID/
          • SZ(KB): File object size (KB).
          • ON: File created Job ID.
          • ST: File status.
          • DST: Data status.
          • PSZ(KB): File (planned) size.
          • CT: File creation time.
          • AT: File activated time.
          • DT: File deleted time.
          • DR: File active duration.
          • CO($): File storage cost.
    • 6.report_network_usage.csv provides
      • detailed information for Network usage including network cost for incoming/outgoing data transfer.
      • Meaning of all attributes for 6.report_network_usage.csv
        1. JOBID: Job ID for network usage.
        2. IN_TS(KB): Input file size.
        3. IN_DR: Input file flow direction. (e.g., IFTD_IC: input file is from inside clouds, IFTD_OC: input file is from outside of clouds.)
        4. IN_COST($): Network usage cost for input file.
        5. OUT_TS_PLANNED(KB): Output file size (planned).
        6. OUT_TS_ACTUAL(KB): Output file size (actual).
        7. OUT_COST($): Network usage cost for output file.
        8. TOTAL_COST($): Total network cost for input/output files (IN_COST($) + OUT_COST($))

    read less simulation results...

    PICS Validation Results

    In order to validate the correctness of PICS simulation, we have compared PICS with real-world cloud application on Amazon Web Services.