A Data Scientist's blog: Sunday, December 18, 2011

Sunday, December 18, 2011

What is Stream Processing ???

Stream processing is a computer programming paradigm, related to SIMD (single instruction, multiple data), that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the FPUs on a GPU or field programmable gate arrays (FPGAs)^[1], without explicitly managing allocation, synchronization, or communication among those units.
The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a set of data (a stream), a series of operations (kernel functions) are applied to each element in the stream. Uniform streaming, where one kernel function is applied to all elements in the stream, is typical. Kernel functions are usually pipelined, and local on-chip memory is reused to minimize external memory bandwidth. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use scoreboarding, for example, to launch DMAs at runtime, when dependencies become known. The elimination of manual DMA management reduces software complexity, and the elimination of hardware caches reduces the amount of die area not dedicated to computational units such as ALUs.
During the 1980s stream processing was explored within dataflow programming. An example is the language SISAL (Streams and Iteration in a Single Assignment Language).

Applications

Stream processing is essentially a compromise, driven by a data-centric model that works very well for traditional DSP or GPU-type applications (such as image, video and digital signal processing) but less so for general purpose processing with more randomized data access (such as databases). By sacrificing some flexibility in the model, the implications allow easier, faster and more efficient execution. Depending on the context, processor design may be tuned for maximum efficiency or a trade-off for flexibility.
Stream processing is especially suitable for applications that exhibit three application characteristics^{[citation needed]}:

Compute Intensity, the number of arithmetic operations per I/O or global memory reference. In many signal processing applications today it is well over 50:1 and increasing with algorithmic complexity.
Data Parallelism exists in a kernel if the same function is applied to all records of an input stream and a number of records can be processed simultaneously without waiting for results from previous records.
Data Locality is a specific type of temporal locality common in signal and media processing applications where data is produced once, read once or twice later in the application, and never read again. Intermediate streams passed between kernels as well as intermediate data within kernel functions can capture this locality directly using the

For More Info Please visit Vikipedia link....

http://en.wikipedia.org/wiki/Stream_processing

Thank You...

What is Stream Computing ??

In computing, the term stream is used in a number of ways, in all cases referring to a sequence of data elements made available over time. A stream can be thought of as a conveyor belt that allows items to be processed one at a time rather than in large batches.

On Unix and related systems based on the C language, a stream is a source or sink of data, usually individual bytes or characters. Streams are an abstraction used when reading or writing files, or communicating over network sockets. The standard streams are three streams made available to all programs.
Pipelines can also be understood as streams as well as any unlimited (non-packaged) information that is inserted by a device.
In the Scheme language and some others, a stream is a lazily evaluated or delayed sequence of data elements. A stream can be used similarly to a list, but later elements are only calculated when needed. Streams can therefore represent infinite sequences and series.^[1] See Stream (type theory).
In the Smalltalk standard library and in other programming languages as well, a stream is an external iterator. As in Scheme, streams can represent finite or infinite sequences.
Stream processing — in parallel processing, especially in graphic processing, the term stream is applied to hardware as well as software. There it defines the quasi-continuous flow of data that is processed in a dataflow programming language as soon as the program state meets the starting condition of the stream.^[2]
Filesystems can store multiple named independent streams against a single filename. There is one main stream that makes up the normal file data. Additional streams can be used to store icons, summary and indexing information, zone information (i.e., where the file was downloaded from) etc.^[3]
^{For More Info.... Please Visit Wikipedia Link}
^{http://en.wikipedia.org/wiki/Stream_%28computing%29}
^{http://en.wikipedia.org/wiki/Stream_%28computing%29}

What is Distributed Computing ??

Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.^[1]
Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers.^[2]

Parallel and distributed computing

Distributed systems are groups of networked computers, which have the same goal for their work. The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them.^[13] The same system may be characterised both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.^[14] Parallel computing may be seen as a particular tightly-coupled form of distributed computing,^[15] and distributed computing may be seen as a loosely-coupled form of parallel computing.^[5] Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:

In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between processors.^[16]
In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.^[17]

The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; as usual, the system is represented as a network topology in which each node is a computer and each line connecting the nodes is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.
The situation is further complicated by the traditional uses of the terms parallel and distributed algorithm that do not quite match the above definitions of parallel and distributed systems; see the section Theoretical foundations below for more detailed discussion. Nevertheless, as a rule of thumb, high-performance parallel computation in a shared-memory multiprocessor uses parallel algorithms while the coordination of a large-scale distributed system uses distributed algorithms.

Applications

There are two main reasons for using distributed systems and distributed computing. First, the very nature of the application may require the use of a communication network that connects several computers. For example, data is produced in one physical location and it is needed in another location.
Second, there are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system is beneficial for practical reasons. For example, it may be more cost-efficient to obtain the desired level of performance by using a cluster of several low-end computers, in comparison with a single high-end computer. A distributed system can be more reliable than a non-distributed system, as there is no single point of failure. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system.^[21]
Examples of distributed systems and applications of distributed computing include the following:^[22]

Telecommunication networks:
- Telephone networks and cellular networks.
- Computer networks such as the Internet.
- Wireless sensor networks.
- Routing algorithms.
Network applications:
- World wide web and peer-to-peer networks.
- Massively multiplayer online games and virtual reality communities.
- Distributed databases and distributed database management systems.
- Network file systems.
- Distributed information processing systems such as banking systems and airline reservation systems.
Real-time process control:
- Aircraft control systems.
- Industrial control systems.
Parallel computation:
- Scientific computing, including cluster computing and grid computing and various volunteer computing projects; see the list of distributed computing projects.
- Distributed rendering in computer graphics.
For More Info... Please visit Wikipedia Link

http://en.wikipedia.org/wiki/Distributed_computing

^{Thank You....}

What is Cloud Computing ??

Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a metered service over a network (typically the Internet).

Cloud computing is a marketing term for technologies that provide computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. A parallel to this concept can be drawn with the electricity grid, wherein end-users consume power without needing to understand the component devices or infrastructure required to provide the service.
Cloud computing describes a new supplement, consumption, and delivery model for IT services based on Internet protocols, and it typically involves provisioning of dynamically scalable and often virtualised resources.^[1]^[2] It is a byproduct and consequence of the ease-of-access to remote computing sites provided by the Internet.^[3] This may take the form of web-based tools or applications that users can access and use through a web browser as if the programs were installed locally on their own computers.^[4]
Cloud computing providers deliver applications via the internet, which are accessed from web browsers and desktop and mobile apps, while the business software and data are stored on servers at a remote location. In some cases, legacy applications (line of business applications that until now have been prevalent in thin client Windows computing) are delivered via a screen-sharing technology, while the computing resources are consolidated at a remote data center location; in other cases, entire business applications have been coded using web-based technologies such as AJAX.
At the foundation of cloud computing is the broader concept of infrastructure convergence (or Converged Infrastructure) and shared services.^[5] This type of data center environment allows enterprises to get their applications up and running faster, with easier manageability and less maintenance, and enables IT to more rapidly adjust IT resources (such as servers, storage, and networking) to meet fluctuating and unpredictable business demand.^[6]^[7]
Most cloud computing infrastructures consist of services delivered through shared data-centers and appearing as a single point of access for consumers' computing needs. Commercial offerings may be required to meet service-level agreements (SLAs), but specific terms are less often negotiated by smaller companies.^[8]^[9]
The tremendous impact of cloud computing on business has prompted the federal United States government to look to the cloud as a means to reorganize their IT infrastructure and decrease their spending budgets. With the advent of the top government official mandating cloud adoption, many agencies already have at least one or more cloud systems online.^[10]

Cloud computing shares characteristics with:

Autonomic computing — Computer systems capable of self-management.^[11]
Client–server model — Client–server computing refers broadly to any distributed application that distinguishes between service providers (servers) and service requesters (clients).^[12]
Grid computing — "A form of distributed and parallel computing, whereby a 'super and virtual computer' is composed of a cluster of networked, loosely coupled computers acting in concert to perform very large tasks."
Mainframe computer — Powerful computers used mainly by large organisations for critical applications, typically bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.^[13]
Utility computing — The "packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility, such as electricity."^[14]
Peer-to-peer — Distributed architecture without the need for central coordination, with participants being at the same time both suppliers and consumers of resources (in contrast to the traditional client–server model).

Cloud computing exhibits the following key characteristics:

Empowerment of end-users of computing resources by putting the provisioning of those resources in their own control, as opposed to the control of a centralized IT service (for example)
Agility improves with users' ability to re-provision technological infrastructure resources.
Application programming interface (API) accessibility to software that enables machines to interact with cloud software in the same way the user interface facilitates interaction between humans and computers. Cloud computing systems typically use REST-based APIs.
Cost is claimed to be reduced and in a public cloud delivery model capital expenditure is converted to operational expenditure.^[15] This is purported to lower barriers to entry, as infrastructure is typically provided by a third-party and does not need to be purchased for one-time or infrequent intensive computing tasks. Pricing on a utility computing basis is fine-grained with usage-based options and fewer IT skills are required for implementation (in-house).^[16]
Device and location independence^[17] enable users to access systems using a web browser regardless of their location or what device they are using (e.g., PC, mobile phone). As infrastructure is off-site (typically provided by a third-party) and accessed via the Internet, users can connect from anywhere.^[16]
Multi-tenancy enables sharing of resources and costs across a large pool of users thus allowing for:
- Centralisation of infrastructure in locations with lower costs (such as real estate, electricity, etc.)
- Peak-load capacity increases (users need not engineer for highest possible load-levels)
- Utilisation and efficiency improvements for systems that are often only 10–20% utilised.^[18]
Reliability is improved if multiple redundant sites are used, which makes well-designed cloud computing suitable for business continuity and disaster recovery.^[19]
Scalability and Elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads.^[20]^[21]
Performance is monitored, and consistent and loosely coupled architectures are constructed using web services as the system interface.^[16]
Security could improve due to centralisation of data, increased security-focused resources, etc., but concerns can persist about loss of control over certain sensitive data, and the lack of security for stored kernels.^[22] Security is often as good as or better than other traditional systems, in part because providers are able to devote resources to solving security issues that many customers cannot afford.^[23] However, the complexity of security is greatly increased when data is distributed over a wider area or greater number of devices and in multi-tenant systems that are being shared by unrelated users. In addition, user access to security audit logs may be difficult or impossible. Private cloud installations are in part motivated by users' desire to retain control over the infrastructure and avoid losing control of information security.
Maintenance of cloud computing applications is easier, because they do not need to be installed on each user's computer.

For more info please visit Wikipedia link

http://en.wikipedia.org/wiki/Cloud_computing

Thank you....

Search This Blog

Sunday, December 18, 2011

What is Stream Processing ???

Applications

What is Stream Computing ??

What is Distributed Computing ??

Parallel and distributed computing

Applications

What is Cloud Computing ??