Thursday, January 17, 2013

Software Guidelines: Performance, Scalability, Transactions management

This is a part of the blog series about (SOA) software guidelines. For the complete list of the guidelines (i.a. about design, security, performance, operations, database, coding, versioning) please refer to:

Performance, Scalability, Transactions

  • Aware the performance impact to other consumer services/application, to the whole system (end-to-end performance.)
  • Can you classified the service level of the operations e.g. long running (processStudentFraternityForm) vs time-critical (buyStockOptions)?
  • Choose your transport carefully e.g. Weblogic T3 is faster than http but http has more interoperability (e.g. firewalls/routers/load-balancers can handle http but not T3).
  • Relax reliability requirement in exchange for performance e.g.:
    • don't use persistent/durable jms
    • don't use durable BPEL process
    • don't use QoS exactly one / WS-ReliableMessaging
    • don't use 2PC/XA distributed transactions, use XA relaxations such as XA-LastResource (e.g. Oracle LLR), non XA (e.b. best effort)
  • Don't use (web service / jms) handler if it's not necessary
  • Regularly audit & remove the unnecessary services/processes/databases, backup & remove  virtual machines which you don't use anymore (e.g. if the development/test phase has finished). Reducing number of running components will improve performance and reduce the chance of faults.
  • For future scalability, scale the capacity during design (20x), implementation (3x), deployment (1.5x). These numbers are just an indication, the real numbers are depend on how fast your service will grow in the coming 1-2 years.
  • Address end-to-end performance (profiling and then reworks/tune only the components that have bottleneck) is more efficient than tune the performance of every components.
  • Use throttling.
  • Prioritize your effort regarding software performance:  the most important area is the application/query design/implementation  (e.g. avoid select * & Cartesian product) . Furthermore, the next focus candidates are (with order according to priority): database design (e.g. normalization/denormalization, indexes), hardware (e.g. disk defragmentation), database-engine tuning (e.g. SQL Server) and OS-tuning (e.g. shared memory parameters).
Distributed computing and transactions:
  • First at all, if possible avoid using distributed system to avoid problems with concurrency and data integrity/synchronization
  • Reducing number of transactions by combining operations from different transactions to one transactions if possible.
  • If the objects are read only, it doesn't have to be synchronized/ thread safe. In the sake of performance, don't apply thread safe by default.
  • Can the process done in parallel? Parallelism not only improves performance but also the availability / fault tolerance. Cost-benefit analysis: the overhead cost of parallel processing e.g. parallel processing of small data chunks are not effective. Use parallel algorithm (e.g. map reduce) if necessary.
  • Minimize lock times:
    • Hold lock only for codes that need atomicity (e.g. read only code & preprocessing can be done before/outside the transaction)
    • Tune/optimize the performance of the concurrent processes in the transaction (e.g. using partitioning) since how faster the concurrent processes how smaller the chance for bottleneck due to lock contention
    • Use optimistic concurrency strategies (but then you need to make sure the conflict detection & resolution)
    • Avoid waiting for user responds during transaction
    • Minimize call to outside boundary (e.g. invoke external cloud web service) during transaction. If possible do the call before/after the transaction.
    • Use compensation instead of rely on transactional rollback if the cost of long transaction is too expensive.
    • Lock: acquire late, release early.
    • Split big transaction/batch job: divide & conquer. If fault, you don't have to restart over the whole big job.
  • Relax isolation level if possible (trade off read consistency for performance)
  • Since for most database usage read is more prevalent than update, you can parallel the read process using multiple slaves such as in the MySql master-slaves replication. You can update the master only and then synchronize the slaves with the master. Backup & data analysis can be run in the slaves, thus will ease the burden of the master.
  • Whenever possible use system transaction (that relies on your application framework or database) instead of build your own business transactions, since offline-concurrency management  is difficult/expensive to implement and error prone.
  • Classify the transactions to light type (critical/need fast response e.g. login) and heavy type (less frequent complex transaction that more latency is tolerated e.g.generateReport). Set different target transaction per seconds for both types.
  • Use deadlock detection & resolution (e.g. using Oracle database EM console). In many platforms you can define the timeout of (2PC) transactions (e.g. Weblogic console: domain JTA config).
  • Balance the granularity of (database) lockscoarse (more contention/locking problem) vs fine grained locks (more processing overhead) e.g. do you lock the whole database table or only a row.
  • The scalability of clusters & load balancing: can you easily add more nodes in the clusters?
  • If you use multi-threading make sure that no performance loss due to locking problems, no deadlock.
  • Different ways to split  application servers, database/storage, networks:
    • clone the same instances (x-axis... using the terminology from Abbott & Fisher's book)
    • split different actions / services (y-axis) e.g. getEmployee, updateEmployee, getStudent
    • split similar things (z-axis)  based on customer (names begin with A-M & N-Z), geographic (Europe mirror, US mirror/cache).
    • Splitting will improve performance, improve caching (reducing cache sizes in case of y-axis / z-axis split) and fault isolation. In term of transaction-growth y-axis splitting yield less scalability benefit than x-axis, but y-axis can help to scale code complexity (CPU) / data (memory) growth.
  • When you need to use distributed systems (e.g. database clusters due to performance), be aware of Brewers' CAP constraints: you can't achive all Consitency, high-Availability and Partition-tolerance at the same time. So relax the constraints e.g. using BASE (Basic Availability Soft-state Eventual consistency) to relax ACID consistency. Recognize which CAP relaxation used by your platform(e.g. Google AppEng/Hadoop prioritizes CA, Amazon S3 prioritizes AP).
Reduce network traffics:
  • What is the optimal message size/ granularityGranularity/message-size tradeoff: using coarse-grained messages (avoid small messages / chatty communications) but avoid too large / too complex XML messages that will overwhelm the server.
  • Wrapper chatty interfaces with course-grain interfaces (e. g. remote facade & DTO patterns) especially for costly remote calls (inter processes / inter machines).
  • Put the processing closer to the resources it uses (e.g. PL/SQL)
  • Batch work, e.g. store procedures in database to reduce communication between database and business logic layer. You can also run multiple sql queries using semicolons.

Minimize data:
  • Minimize data (for performance) e.g. when synchronizing between systems export only the changed data (using change events), instead of export a huge bulk data every day.
  • After several iterations of redesign & recoding, you might have to refactor the data model again to minimize data size. Due to feature creep you might have added too much data in your database DAO or JMS messages
  • Minimize GUI page size & images
  • Use http compression, this is important especially if SSL (encrypt/decrypt) and serialization/marshalling/transformation needed since how bigger the data the more the processing cost.
  • Keep calls inside the same process & machines to reduce serialization & networks overhead. You might have to trade off this with reusability and scalability (e.g. SOA grid architecture).
  • Serialize/transmit only the data you need to. Some data are only needed for temporary (transient data). Some data are barely needed in future (e.g. yearly report 1 times/year) and can be easily derived from other persisted data.

Resource management:
  • Periodically shrink the connection pool to free up connections that are no longer needed.
  • Use time out for resources (e.g. database connection pool). The faster the time out, the faster the load balancer can detect a failure server and reassign the tasks to other serves, the faster the node manager (in case of Weblogic) restart the failure server for failover.
  • Create/load objects on demand (e.g. lazy loading)
  • Cache/pool to reuse resources which are expensive to create (e.g. jms connections, database connections, sessions, thread pool).
  • If you have high think usage patterns (small transactions with thinking pauses e.g. form submissions with small data) you might consider using shared server to share resources for multiple connections thus improving scalability. But for high traffic big data usage pattern better to use dedicated process.
  • Consider number of resource connections to  when determine thread pool size. Avoid bottleneck due to more thread processes than resources available.
  • Connection/resource: acquire late & release early
  • Clean up your resources after the operation finish (e.g. close file/connection, destructor, dereference pointer, invalidate session)
  • Cache to reduce processing & network time cost, especially for data that are expensive to query/compute/render. Good candidates for cache such as data in presentation layer often used for static web pages, data in database layer generated by store procedure/query often called by business logic.
  • Where to save cache data (in memory for short period or in database for long period). To reduce the network round-trips, implement the cache in the same layer/machine where the processing is (e.g. cache in presentation layer if the GUI often reuses the same data).
  • Due to scalability, avoid cache per-user basis  in the server that will cost enormous memory as the number of sessions grows. If you really need to cache per-user use client cache instead but use limited data, use encryption & validation if necessary.
  • Validate using performance test that the cache mechanism is indeed improving performance instead of just adding extra processing (e.g. too big cache memory overhead, too often updates)
  • Select the cache key that not too long but discriminative enough
  • If you need to transform the cache data, transform it before caching to reduce reprocessing
  • Avoid caching data that need to be synchronize across servers (e.g. coherence mechanism is expensive)
  • Avoid caching volatile/ frequently changing data. For certain application, you may relax the novelty constraint (e.g. refresh the cache only every 1 minutes instead of every milliseconds).
  • Determine the appropriate interval to refresh the data (too long: stale data problem, too short: performance cost)
  • Decide which refresh strategy e.g. least recently used, least frequently used, fixed interval

State management:
  • Prefer stateless system (lower cost of memory, processing, network bandwidth, more scalable / less synchronization need between clusters). So avoid statefulness (passing session data back & forward every communications) to reduce communication bandwidth and session/server affinity (that hinders scalability). However statefulness communication sometime can't be avoided (e.g. shopping cart, business process conversation) and can save performance (e.g. to avoid reauthenticate or requery data for the same user in every request).
  • Save the session data in the client (e.g. using http5 storage) is more scalable in term of server memory if you have many clients, also the client session data avoid data replication/synchronization need between servers. But client session data have disadvantages: security risk (so use encryption) and increasing communication bandwidth (so minimize the data). Another alternative: dedicated resource manager that store the states (e.g. Oracle Access Manager).
  • You can save the session data in a centralized storage to avoid data replication need between servers.
  • If you save session data in server and database you need to implement clean up in case of user cancel (with timeout) and data persistence (in case of system crash)
  • Minimize session data, especially if you need to save it in shared resources (the bigger data the more chance of contention problem.)
JMS / Messaging system:
  • Use message quota/limit
  • Don't use guaranteed delivery / durable / transactional jms if it's not necessary
  • When choosing file vs database persistences for QoS jms: you need to do performance test to decide since it depends on many factors such as message sizes, OS, file system (e.g. RAID level), database tuning. In general no-synch (no waiting ack from filesys) file-persistent is faster than database or synch-file persistence.
  • When you use guaranteed delivery jms: send ack for a group of messages (bigger granularity)
  • Prevent large backlogs by: delete expired messages, use async customer, monitor the queue and take action to stop the further inputs before the system gets overwhelm)
Factors affecting XML web service performances:
  • message size.
  • schema types (e.g. int is faster than integer, time is faster than datetime).
  • handler (e.g. handlers for inserting/removing WS-Security headers) and transformation      (use xpath best practice e.g. avoid using //).

Performance tests:
  • Use performance test & profiling to find the bottleneck. You don't have time to tune everything so you need to know which parts should get prioritized for tuning.
  • Load test & stress test: make sure that the response time is acceptable within SLA limit.
  • Is the worst response time within the timeout settings of the load balancers/proxies?
  • Load test & stress test: make sure that the processing capacity is acceptable (e.g. 100M XML messages/second), what is the peak volumes?
  • Load test & stress test: make sure that the memory requirement is acceptable
  • Use performance test to tune database using profiling tools.
  • Actually the more appropriate performance metrics are in term of computation counts (e.g. CPU cycles, numbers of read operations) instead of execution time (which relatively depends on hardware, contentious CPU/disk operations during the test). Nevertheless the performance metrics commonly used in SLA are in terms of time (e.g. web application response time), therefore ideally we use both operational counts & time (e.g. as described by tkprof tool in Oracle database).
Source: Steve's blogs

Any comments are welcome :)



    ·         Distributed Transactions:  

·         Patterns for Performance and Operability by Ford
·         Blueprints for High Availability by Marcus & Stern
·         Scalability Rules by Abbott & Fisher

High Availability and Disaster Recovery by Schmidt

·         The Art of Scalability by Abbott & Fisher
·        Code complete by McConnell
·        Java concurrency
·         Report review & test checklist, university washington
·         IEEE Standard for Software Reviews and Audits 1028-2008
·         The Definitive Guide to SOA by Davies
·         Enterprise Integration Patterns by Hohpe
·         SOA Principles of Service Design by Erl
·         Improving .NET Application Performance and Scalability by Meier
·         Patterns of Enterprise Application Architecture by Fowler
·          Hacking Exposed Web Applications by Scambray
·          OWASP Web Service Security Cheat Sheet
·          OWASP Code Review Guide
·          Improving Web Services Security (Microsoft patterns & practices) by Meier
·          Improving .NET Application Performance and Scalability by Meier
·          Concurrency Series: Basics of Transaction Isolation Levels by Sunil Agarwal

No comments: