More and more data-driven companies are looking to adopt stream processing and streaming analytics. With this concise ebook, you’ll learn best practices for designing a reliable architecture that supports this emerging big-data paradigm. Authors Ted Dunning and Ellen Friedman (Real World Hadoop) help you explore some of the best technologies to handle stream processing and analytics, with a focus on the upstream queuing or message-passing layer. To illustrate the effectiveness of these technologies, this book also includes specific use cases. Ideal for developers and non-technical people alike, this book describes: Key elements in good design for streaming analytics, focusing on the essential characteristics of the messaging layer New messaging technologies, including Apache Kafka and MapR Streams, with links to sample code Technology choices for streaming analytics: Apache Spark Streaming, Apache Flink, Apache Storm, and Apache Apex How stream-based architectures are helpful to support microservices Specific use cases such as fraud detection and geo-distributed data streams Ted Dunning is Chief Applications Architect at MapR Technologies, and active in the open source community. He currently serves as VP for Incubator at the Apache Foundation, as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. Ted is on Twitter as @ted_dunning. Ellen Friedman, a committer for the Apache Drill and Apache Mahout projects, is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics. Ellen is on Twitter as @Ellen_Friedman.
Media processing applications, such as three-dimensional graphics, video compression, and image processing, currently demand 10-100 billion operations per second of sustained computation. Fortunately, hundreds of arithmetic units can easily fit on a modestly sized 1cm2 chip in modern VLSI. The challenge is to provide these arithmetic units with enough data to enable them to meet the computation demands of media processing applications. Conventional storage hierarchies, which frequently include caches, are unable to bridge the data bandwidth gap between modern DRAM and tens to hundreds of arithmetic units. A data bandwidth hierarchy, however, can bridge this gap by scaling the provided bandwidth across the levels of the storage hierarchy. The stream programming model enables media processing applications to exploit a data bandwidth hierarchy effectively. Media processing applications can naturally be expressed as a sequence of computation kernels that operate on data streams. This programming model exposes the locality and concurrency inherent in these applications and enables them to be mapped efficiently to the data bandwidth hierarchy. Stream programs are able to utilize inexperience local data bandwidth when possible and consume expensive global data bandwidth only when necessary. Stream Processor Architecture presents the architecture of the Imagine streaming media processor, which delivers a peak performance of 20 billion floating-point operations per second. Imagine efficiently supports 48 arithmetic units with a three-tiered data bandwidth hierarchy. At the base of the hierarchy, the streaming memory system employs memory access scheduling to maximize the sustained bandwidth of external DRAM. At the center of the hierarchy, the global stream register file enables streams of data to be recirculated directly from one computation kernel to the next without returning data to memory. Finally, local distributed register files that directly feed the arithmetic units enable temporary data to be stored locally so that it does not need to consume costly global register bandwidth. The bandwidth hierarchy enables Imagine to achieve up to 96% of the performance of a stream processor with infinite bandwidth from memory and the global register file.
* Learn the end-to-end process, starting with capture from a video or audio source through to the consumer's media player * A quick-start quide to streaming media technologies * How to monetize content and protect revenue with digital rights management For broadcasters, web developers, project managers implementing streaming media systems, David Austerberry shows how to deploy the technology on your site, from video and audio capture through to the consumer's media player. The book first deals with Internet basics and gives a thorough coverage of telecommunications networks and the last mile to the home. Video and audio formats are covered, as well as compression standards including Windows Media and MPEG-4. The book then guides you through the streaming process, showing in-depth how to encode audio and video. The deployment of media servers, live webcasting and how the stream is displayed by the consumer's media player are also covered. A final section on associated technologies illustrates how you can protect your revenue sources with digital rights management, looks at content delivery networks and provides examples of successful streaming applications. The supporting website, www.davidausterberry.com/streaming.html, offers updated links to sources of information, manufacturers and suppliers. David Austerberry is co-owner of the new media communications consultancy, Informed Sauce. He has worked with streaming media since the late nineties. Before that, he has been product manager for a number of broadcast equipment manufacturers, and formerly had many years with a leading broadcaster. * Learn the end-to-end process, starting with capture from a video or audio source through to the consumer's media player * A quick-start guide to streaming media technologies * Fully updated and revised to include a new chapter on streaming to wireless devices and up-to-date technology information for all streaming companies and products * How to monetize content and protect revenue with digital rights management
This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term "Big Data", from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance. Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution. When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it’s often necessary to delegate the load to Hadoop or Spark and use the No-SQL to serve processed data in real time. This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on. Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data. Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools to integrate into that pattern.
Advances in Computer and Information Sciences and Engineering includes a set of rigorously reviewed world-class manuscripts addressing and detailing state-of-the-art research projects in the areas of Computer Science, Software Engineering, Computer Engineering, and Systems Engineering and Sciences. Advances in Computer and Information Sciences and Engineering includes selected papers from the conference proceedings of the International Conference on Systems, Computing Sciences and Software Engineering (SCSS 2007) which was part of the International Joint Conferences on Computer, Information and Systems Sciences and Engineering (CISSE 2007).
The number of users who rely on the Internet to deliver multimedia content has grown significantly in recent years. As this consumer demand grows, so, too, does our dependency on a wireless and streaming infrastructure which delivers videos, podcasts, and other multimedia. Streaming Media with Peer-to-Peer Networks: Wireless Perspectives offers insights into current and future communication technologies for a converged Internet that promises soon to be dominated by multimedia applications, at least in terms of bandwidth consumption. The book will be of interest to industry managers, and will also serve as a valuable resource to students and researchers looking to grasp the dynamic issues surrounding video streaming and wireless network development.
On behalf of all of the people involved in the program selection, the program committee members as well as numerous other reviewers, we are both relieved and pleased to present you with the proceedings of the 2006 Asia-Pacific Computer Systems Architecture Conference (ACSAC 2006), which is being hosted in Shanghai on September 6–8, 2006. This is the 11th in a series of conferences, which started life in Australia, as the computer architecture component of the Australian Computer Science Week. In 1999 it ventured away from its roots for the first time, and the fourth Australasian Computer Architecture Conference was held in the beautiful city of Sails (Auckland, New Zealand). Perhaps it was because of a lack of any other computer architecture conference in Asia or just the attraction of traveling to the Southern Hemisphere but the conference became increasingly international during the subsequent three years and also changed its name to include Computer Systems Architecture, reflecting more the scope of the conference, which embraces both architectural and systems issues. In 2003, the conference again ventured offshore to reflect its constituency and since then has been held in Japan in the beautiful city of Aizu-Wakamatsu, followed by Beijing and Singapore. This year it again returns to China and next year will move to Korea for the first time, where it will be organized by the Korea University.
Continuous media streaming systems will shape the future of information infrastructure. The challenge is to design systems and networks capable of supporting millions of concurrent users. Key to this is the integration of fault-tolerant mechanisms to prevent individual component failures from disrupting systems operations. These are just some of the hurdles that need to be overcome before large-scale continuous media services such as video-on-demand can be deployed with maximum efficiency. The author places the subject in context, drawing together findings from the past decade of research whilst examining the technology’s present status and its future potential. The approach adopted is comprehensive, covering topics – notably the scalability and fault-tolerance issues - that previously have not been treated in depth. Provides an accessible introduction to the technology, presenting the basic principles for media streaming system design, focusing on the need for the correct and timely delivery of data. Explores the use of parallel server architectures to tackle the two key challenges of scalability and fault-tolerance. Investigates the use of network multicast streaming algorithms to further increase the scalability of very-large-scale media streaming systems. Illustrates all findings using real-world examples and case studies gleaned from cutting-edge worldwide research. Combining theory and practice, this book will appeal to industry specialists working in content distribution in general and continuous media streaming in particular. The introductory materials and basic building blocks complemented by amply illustrated, more advanced coverage provide essential reading for senior undergraduates, postgraduates and researchers in these fields.
The book Kafka Streams - Real-time Stream Processing helps you understand the stream processing in general and apply that skill to Kafka streams programming. This book is focusing mainly on the new generation of the Kafka Streams library available in the Apache Kafka 2.x. The primary focus of this book is on Kafka Streams. However, the book also touches on the other Apache Kafka capabilities and concepts that are necessary to grasp the Kafka Streams programming. Who should read this book? Kafka Streams: Real-time Stream Processing is written for software engineers willing to develop a stream processing application using Kafka Streams library. I am also writing this book for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Kafka implementation, but they work with the people who implement Kafka Streams at the ground level. What should you already know? This book assumes that the reader is familiar with the basics of Java programming language. The source code and examples in this book are using Java 8, and I will be using Java 8 lambda syntax, so experience with lambda will be helpful. Kafka Streams is a library that runs on Kafka. Having a good fundamental knowledge of Kafka is essential to get the most out of Kafka Streams. I will touch base on the mandatory Kafka concepts for those who are new to Kafka. The book also assumes that you have some familiarity and experience in running and working on the Linux operating system.