-->
保存您的免费座位流媒体连接今年八月. Register Now!

在线视频紧跟大数据潮流

Article Featured Image

Every year, 有些术语是如此模糊, 迅速达到炒作的程度, 普通人同时思考是可以被原谅的, “为什么我不知道这是什么意思?以及“我怎么才能不解释它的意思呢??”

2013年的突破性词汇是大数据, which comes at us streaming types from the far reaches of the often esoteric world of databases. 我们怎么知道它是突破术语? Analysis passed on by a colleague of mine who works for a Big Data analysis start-up shows that almost as many articles were written on Big Data in the first half of 2013 as were written during all of 2012.

"For the whole of 2012, the number of articles in technology publications written about big data was just under 19,000 articles,” says Susan Puccinelli, 传媒总监 Datameer. “在2013年上半年, 已经有将近14个了,000 articles, 而且这个数字每个月都在增长.”

In other words, every 10 days there are almost 500 articles written on this one topic alone.

What Is Big Data?

A little later, 我将介绍大数据中与流媒体相关的部分, but, first, 它有助于设定一个定义,因为主题本身可能很难确定. 要做到这一点,让我们首先从纯数据库的角度来看它.

The term Big Data, 当应用于典型的数据库场景时, 可以归结为三个主要领域:许多不同数据库的聚合, 包含无模式数据, 以及一套从数据中获取意义的分析工具.

The king of the database world, in terms of overall deployments, is the relational database. 这种类型的数据库几乎存在于所有的软件和硬件产品中, 甚至到了操作系统的层面, 因为它非常擅长管理适合结构或模式的数据.

大多数关系数据库都有一个问题, but the problem has been masked by decades of management system tools (relational database management system; RDMS): Not all data fits within a specific schema. In addition, some data may be better used in two locations or tables in a traditional relational schema, such as a table on houses on a given street and also a table on automobile ownership of cars parked on that same street.

要处理这个问题,需要将信息分散到不同的表中, relational databases use primary keys or one specific bit of data that links two tables together. 但是对于停在街上的汽车来说, 哪些数据变化频繁, 不像固定资产(如房子)那样容易融入有意义的模式.

To combat this issue, 数据库世界中出现了许多“脏数据”选项, from the simplest XML markup documents to more powerful document-based databases that use on-the-fly indexing to map out the similarities and reduce them down to dynamic “table” structures. 其中许多被称为map-reduce数据库, in which the answers derived from queries to the database are presupposed and a basic schema is formed around queries for the schema-less data.

A third area of databases, 这在社交媒体网络中尤其流行, 是图形数据库的概念吗. In this instance, 关系是关键因素, 图是一种进行复杂搜索的新方法.

在我们前面提到的例子中, the proximity of a car’s parked position on the street relative to a given house might allow us to make a map-reduce index, but a graphing database would allow us to find friends of a particular homeowner that own the same model of car that’s parked on the street, in addition to confirming if that particular friend lives in town or in another country. Facebook’s rollout of graph searches will allow its users to do exactly these kinds of searches on friends that meet particular demographic or geographic criteria.

关系数据库中显示数据块之间关系的图形.

In many ways, Big Data in general -- and graphing databases in particular -- rely on the use of tags to create relationships between objects, persons, 以及其他不同的数据. To get to the point where any person or object is tagged enough to have value in a search graph, 需要做大量的索引工作.

这如何适合流媒体?

So if Big Data is about combining databases and running the proper analytics to find the answers behind the questions, 这一切如何适用于流媒体?

对于这个问题,你会得到截然不同的答案, but the map-reduce index that I derive from Big Data in streaming comes down to three things: content management, mission-critical delivery, 索引和元数据可用性.

内容管理和存储

Without a doubt, the percentage of video in terms of total data traffic has grown by leaps and bounds. Some studies suggest that on-demand internet video traffic itself accounts for almost one-third of all internet traffic during primetime hours, 在很大程度上要感谢 Netflix. Some traffic estimates for 2014 reflect that the majority of all data delivered across the internet will be video content.

这为内容交付网络(CDN)提供商提供了机会, 其中一些已经迎接了挑战. But the Big Data issue for these CDNs is less about video content management and more about management of all the content surrounding the video.

Interestingly, 解决cdn面临的所有内容管理问题, the issue of streaming content management is actually becoming a fairly simple one: keep track of the multiple versions of an on-demand video file that are needed to form adaptive bitrate (ABR) delivery. ABR是否通过苹果的HTTP Live Stream (HLS)传输?, 微软的流畅流媒体, 或新兴的HTTP动态自适应流(MPEG-DASH), progress has been made on all fronts in allowing on-the-fly segmentation of these various ABR technologies.

In 2007, 50%的互联网流量来自几千个网站, but by 2009, 50%来自150个站点(左). 今天(右),50%的互联网流量来自35个网站或服务. (图表由DeepField提供) 

This means that we no longer have to keep track of thousands of 2-second segments in permanent databases, something that, a few years ago, Netflix projected would exceed 10 billion assets if they were required to store premium content in pre-segmented form.

DELIVERY

第二个受到关注的领域是内容的传递, 尤其是随着“超大型”网站数量的增长.

Craig Labovitz的演讲, DeepField的联合创始人兼首席执行官, 在2013年内容交付峰会上, narrowed in on the growth issues facing CDNs when it comes to content management.

“CDN traffic now represents more than half of all consumer traffic in the United States,” says Labovitz. “与我们2009年发表的上一份报告相比,这是一个非常戏剧性的变化.”

Labovitz notes that the consolidation of traffic to a few key CDNs has been an ongoing trend, 2007年50%的流量来自几千个网站. By 2009, the number of sites required to reach half of the data consumed on the North American internet was down to several hundred, and the new report’s initial data suggests the number of sites required is now less than 40 CDN or Top 10 sites.

“我们正越来越多地向非常平坦的方向发展, dense, 高度互联网络,Labovitz在之前的会议中指出. “[M]ost of the traffic isn’t flowing up along [a] tree to reach the Tier 1s and back down. Most of the traffic today is interchange between what we’ve been calling the hyper-giants.”

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

流媒体东15:大数据困扰许多视频提供商

Some of the biggest video publishers around are sitting on several years' worth of viewer data that they're only now beginning to sift through.

Rovi首次推出娱乐分析,多屏幕大数据

After acquiring IntegralReach, Rovi is unveiling a big data analytics solution for targeted reach.

MediaCom:大数据是联网电视成功的关键

流媒体视频正在通过各种连接设备进入家庭. 广告商也在跟进,依靠大数据来触及正确的目标.