0% found this document useful (0 votes)

27 views7 pages

Semantic Class Detectors in Video Genre Recognition

Uploaded by

Bk Rayen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Semantic Class Detectors in Video Genre Recognition

Uploaded by

Bk Rayen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

SEMANTIC CLASS DETECTORS IN VIDEO GENRE

RECOGNITION

Michal Hradiš 1 , Ivo Řeznı́ček1 and Kamil Behúň1

1 Department of Computer Graphics and Multimedia, Brno University of Technology, Bozetechova 1/2, Brno, Czech
Republic
{ihradis, ireznice}@fit.vutbr.cz, xbehun03@stud.fit.vutbr.cz

Keywords: Video Genre Recognition, Semantic Indexing, Local Features, SIFT, SVM

Abstract: This paper presents our approach to video genre recognition which we developed for MediaEval 2011 evalua-
tion. We treat the genre recognition task as a classification problem. We encode visual information in standard
way using local features and Bag of Word representation. Audio channel is parameterized in similar way
starting from its spectrogram. Further, we exploit available automatic speech transcripts and user generated
meta-data for which we compute BOW representations as well. It is reasonable to expect that semantic con-
tent of a video is strongly related to its genre, and if this semantic information was available it would make
genre recognition simpler and more reliable. To this end, we used annotations for 345 semantic classes from
TRECVID 2011 semantic indexing task to train semantic class detectors. Responses of these detectors were
then used as features for genre recognition. The paper explains the approach in detail, it shows relative per-
formance of the individual features and their combinations measured on MediaEval 2011 genre recognition
dataset, and it sketches possible future research. The results show that, although, meta-data is more infor-
mative compared to the content-based features, results are improved by adding content-based information to
the meta-data. Despite the fact that the semantic detectors were trained on completely different dataset, using
them as feature extractors on the target dataset provides better result than the original low-level audio and
video features.

1 INTRODUCTION oped for MediaEval 2011 Genre Tagging Task (Lar-

son et al., 2011; Hradis et al., 2011). The data pro-
vided for this task include Creative Commons videos
Genre is one of the most basic information which
downloaded from blip.tv, automatic speech recogni-
is available for movies, TV shows and other profes-
tion (ASR) transcripts (Gauvain et al., 2002), user
sional video content. It is presented to viewers in
supplied metadata for the videos and relevant social
television programs, cinemas, rental shops and their
data from Twitter. Each video was assigned one of
on-line variants. It is important information a poten-
26 exclusive genre categories. Some of the results re-
tial viewer takes into consideration when choosing the
ported in this paper are the runs submitted for eval-
content to view.
uation while other results were acquired later on re-
Although the genre information is available for leased test data using the official MediaEval evalua-
commercial content, it is not always supplied for user tion methodology.
videos and other amateur content, or this informa-
tion is not detailed or reliable enough. For example, We approached the genre recognition task as a
Youtube allows users to choose one of 15 mutually classification problem in a way which is consistent
exclusive categories without any suggestion based on with the current trends in extraction of semantic infor-
the video content. Video sharing sites would benefit mation from video (Snoek et al., 2010; Smeaton et al.,
from automatic genre recognition when this informa- 2009). We relied on Support Vector Machine (SVM)
tion is not supplied by the user, and uploading videos to learn models of the individual genres based on sev-
could be made easier by suggesting genre to the user eral types of Bag of Word (BOW) representations.
based on the content. These classifiers, each using different type of fea-
The approach presented in this paper was devel- tures, were fused by Logistic regression to obtain final
decision rules. The information sources used in the and several BOW representations based on differ-
experiments were still images extracted from video, ent modalities. Several types of features are com-
audio, ASR transcripts and from user-supplied meta- bined by classifier fusion in order to get more robust
data. Local image features (van de Sande et al., 2010; and reliable classifications. The following text de-
Mikolajczyk and Schmid, 2005) were extracted from scribes all the parts of the classification framework
the images. The audio was transformed into spectro- (see Figure1), and it introduces the semantic classi-
grams and from these local image features were ex- fiers and explains how they are used for genre recog-
tracted as well. The local features were transformed nition.
to BOW by codebook transform (van Gemert et al.,
2010). Similarly, BOW representations were con- 2.1 Feature Extraction
structed from the ASR transcripts and metadata.
The approach outlined in the previous paragraph
For the purpose of extraction of visual features, 100
aims to map low-level visual and acoustic information
frames were extracted from each video. The frames
directly to the video genre categories. Although some
were extracted equidistantly from the whole video.
genres can be characterized by general visual and au-
We did not use shot change information or any type
dio properties, such as dark colors, fast movement or
of key-frame detection. The reason for this is that we
dissonant and unsettling music (Brezeale and Cook,
intended to sample the videos as representatively as
2008), it is quite a leap from the the low level features
possible, and shot information or key-frame detection
to the very high level genre information. Having se-
could favor certain type of content (e.g. fast dynamic
mantic information (e.g. objects, activities, environ-
scenes).
ment type, screams and gun shots) extracted from the
video would definitely aid in recognizing the genres. From each image, six types of local features were
However, todays automatic methods for extraction of extracted. These six types were created by combining
semantic information from video reach only moder- three image sampling methods and two descriptors.
ate accuracy and require considerable resources and The first sampling method uses the Harris-Laplace
effort, especially, when many semantic classes are to scale invariant detector (Mikolajczyk, 2004) which
be detected (Snoek et al., 2010; Smeaton et al., 2009). localizes corner-like structures in scale-space. Rota-
Even with the shortcomings of the automatic meth- tion invariance was not used as it generally degrades
ods in mind, it is still reasonable to expect that they results in image recognition problems. Further, two
can provide some useful information for genre recog- dense sampling schemes were employed. In both
nition. We explored this idea by training detectors of cases, circular regions were sampled on a regular grid
345 semantic categories on TRECVID 2011 Seman- with step of 8 pixels. The two schemes differ in the
tic Indexing (SIN) task data, and by using results of radius of the circular regions which was 8 pixels and
these detectors as features for genre recognition. 16 pixels.
Older approaches to video genre recognition The local image patches were parameterized using
are well summarized in a survey by Brezeale and the original SIFT descriptor by Lowe (Lowe, 1999)
Cook (Brezeale and Cook, 2008). Recently, You et and RGB-SIFT (van de Sande et al., 2010). The SIFT
al. (You et al., 2010) proposed a method for seman- descriptor computes Histograms of Oriented Gradi-
tic video genre classification based on Gaussian Mix- ents (HOG) on a 4 × 4 grid centered on an image
ture Models, Hidden Markov Models and naive Bayes patch. The computed descriptors are vectors of 128
classifiers. values created by concatenating the 16 histograms.
The paper is structured as follows. Section 2 The magnitude of a single pixel is distributed be-
describes our approach including feature extraction, tween neighboring histograms according to a spa-
classification and fusion, and it explains how the tial Gaussian filter which alleviates the boundary ef-
TRECVID semantic classifiers are used. The Medi- fect. The SIFT descriptor is invariant to shifts and
aEval data are described in Section 3. Experiments scaling of the image intensity channel. It encodes
and their results are presented in Section 4. Finally, the shape of an image patch while being resistant to
the paper is concluded in Section 5. small displacements and geometric transformations.
The RGB-SIFT descriptor (van de Sande et al., 2010)
computes SIFT independently on R, G and B image
channels. For computational reasons, Principal Com-
2 METHOD ponent Analysis was used to reduce dimensionality of
the computed descriptors to 198.
As mentioned earlier, our approach relies on SVM To create feature vectors suitable for classifica-
classifiers, logistic regression for classifier fusion, tion, codebook transform was used to translate the
Figure 1: The processing pipeline: 100 frames and audio segments (spectrograms) are extracted from a video file. Spatial
sampling is performed. Local patches are described by SIFT and RGB-SIFT. BOW representations are computed by codebook
transform. SVM classifiers are trained for each representation and the classifiers are fused by logistic regression.

sets of local descriptors to BOW representations. where σ defines the size of the kernel. In our experi-
Generally, codebook transform assigns objects to a set ments σ was set to the average distance between two
of prototypes and it computes occurrence frequency closest neighboring codewords from the codebook.
histograms of the prototypes. The prototypes are For parameterization of the audio information, an
commonly called codewords and a set of prototypes approach similar to parameterization of the visual in-
is called a codebook. In our case, the codebooks formation was used. The audio track was regularly
were created by k-means algorithm with Euclidean segmented into 100 possibly overlapping segments.
distance. The size of the codebooks was 4096. The length of the segments was 10 seconds and over-
When assigning local features to codewords by lap was allowed as necessary. Mel-frequency spec-
hard mapping, quantization errors occur and some trograms with 128 frequency bands, maximum fre-
information is lost. This is especially significant in quency 8 KHz, window length 100ms and overlap
high-dimensional spaces, as is the case of the local 80ms were computed from these segments. Dynamic
patch descriptors, where the distances to several near- range of the spectrograms was reduced to fit 8-bit res-
est codewords tend to be very similar. In the context olution. The spectrograms were then processed as im-
of image classification, this issue was discussed for ages by dense sampling and SIFT descriptor. BOW
example by Gambert et al. (van Gemert et al., 2010) representation was constructed for the spectrograms
who propose to distribute local patches to close code- by codebook transform the same way as for images.
words according to codeword uncertainty. Computa- For classification, BOW histograms of the indi-
tion of BOW with codeword uncertainty is defined for vidual images and audio segments were averaged to
each codeword w from a codebook B as get single BOW vector of each representation for each
K (w, p) video.
UNC(w) = ∑ ∑v∈B K (v, p) , (1) From the metadata and the ASR data, XML tags
p∈P were removed together with any non-alphabetical
characters and words where lower-case character was
where P is a set of local image features and K is a
followed by upper-case character were split. Stem-
kernel function. We use Gaussian kernel
ming was not performed on the data. Although,
kw − w0 k22

0 the data includes several Dutch, French and Spanish
K(w, w ) = exp − , (2) videos, we did not employ any machine translation, as
2σ2
the ratio of the non-English videos is relatively small of a particular video was used to compute response on
and it should not seriously influence results. For each that video. Before fusion, classifier responses were
video, separate word occurrence counts for metadata normalized to have zero mean and unit standard de-
and ASR were collected. viation. Multinomial L2-regularized logistic regres-
All feature vectors were normalized to unit length sion was used for the fusion. The regularization pa-
for classification. rameter was estimated by the same grid search and
cross-validation procedure as in the case of the linear
2.2 Classification Scheme SVMs. Considering the different nature of the avail-
able features, the video and audio classifiers (see Sec-
tion 2.1) were fused separately and the classifiers us-
Although, the data in MediaEval genre tagging task is ing semantic features (see further in Section 2.3) were
multi-class (a video is assigned to a single class), the fused separately as well. Finally, the two classifiers
evaluation metric is Mean Average Precision, and the created by fusion were fused again with the classifiers
genre recognition problem is in general multi label - based on the ASR transcripts and metadata. In this
one video may belong to several genres, e.g. Sci-Fi second fusion, single set of weights was computed
and comedy. As a result, we build classifiers for each for the different modalities and these were used for
genre separately and independently. all genres in order to limit overfitting.
The classification structure has two levels. The
first level consists of linear SVM classifiers each
based on a single BOW representation. These clas- 2.3 Semantic Detectors in Genre
sifiers are then fused by logistic regression to produce Recognition
robust estimates of the genres.
SVM (Cortes and Vapnik, 1995) is often used for The TRECVID1 2011 SIN task provided a training
various tasks in image and video classification (Le dataset consisting of approximately 11,200 videos
et al., 2011; van de Sande et al., 2010; van Gemert with total length of 400 hours. The duration of the
et al., 2010; Snoek et al., 2010; Smeaton et al., 2009). videos ranges from 10s to 3.5 minutes. The source
SVM has four main advantages. It generalizes well, it of the videos is Internet Archive2 . The videos were
can use kernels, it is easy to work with, and good- partitioned into 266473 shots (Ayache et al., 2006)
quality SVM solvers are available. Although non- which are represented by a corresponding keyframe.
linear kernel have been shown to perform better in The 500 semantic classes proposed by TRECVID or-
image recognition (Perronnin et al., 2010), we se- ganizers were annotated by active learning3 (Ayache
lected linear kernel due to the very small training set and Quénot, 2007; Ayache and Quénot, 2008). Total
size. Radial Basis Function kernels which are usu- 4.1M hand-annotations were collected and this pro-
ally used (Perronnin et al., 2010; van de Sande et al., duced 18M annotations after propagation using rela-
2010; van Gemert et al., 2010; Snoek et al., 2010) tions (e.g. Cat implies Animal). For 345 classes, the
introduce an additional hyper-parameter which has to annotations contained more than 4 positive instances.
be estimated in cross-validation on the training set. Examples of the classes are Actor, Airplane Flying,
Estimating this parameter together with the SVM reg- Bicycling, Canoe, Doorway, Ground Vehicles, Sta-
ularization parameter could prove to be unreliable on dium, Tennis, Armed Person, Door Opening, George
the small dataset. Bush, Military Buildings, Researcher, Synthetic Im-
The single SVM regularization parameter was es- ages, Underwater and Violent Action.
timated by grid search with 5-fold cross-validation if Using the TRECVID SIN task data, 345 semantic
enough samples for particular class were available. classifiers were trained. These classifiers use the same
The objective function in the grid search was Mean eight BOW feature types and the same SVM classi-
Average Precision and the same parameter was used fiers as described in Section 2.1 and Section 2.2. Fur-
for all genre classes for a particular BOW representa- ther details on these classifiers can be found in (Be-
tion. ran et al., 2011) together with the results achieved in
Due to the fact that no validation set was available, TRECVID 2011 evaluation.
we had to re-use the training set for Logistic Regres- We applied these 345 classifiers to the extracted
sion fusion. To keep the classifiers from overfitting, images and audio segments and created feature rep-
we trained the Logistic Regression on responses of the resentations for the videos by computing histograms
5 classifiers learned in cross-validation with the esti-
mated best value of the SVM hyper-parameter. Each 1 http://trecvid.nist.gov/
classifier computed responses on the part of the data 2 http://www.archive.org/

which it was not trained on. This way, no knowledge 3 http://mrim.imag.fr/tvca/

Genre #videos Features TV11
art 4 DENSE16 CSIFT 0.193 0.194
autos and vehicles 1 DENSE16 SIFT 0.149 0.178
business 7 DENSE8 CSIFT 0.178 0.201
citizen journalism 11 DENSE8 SIFT 0.138 0.187
comedy 6 HARLAP CSIFT 0.186 0.178
conferences and other events 2 HARLAP SIFT 0.170 0.174
default category 40 SPECTRUM DENSE16 SIFT 0.183 0.167
documentary 3 SPECTRUM DENSE16 SIFT 0.175 0.189
educational 19 COMBINED 0.254 0.276
food and drink 4 Table 2: Mean average precision achieved by individual
gaming 4 types of features. Line COMBINED contains results of fu-
health 7 sion of all the presented features.
literature 6
movies and television 6
music and entertainment 2 4 EXPERIMENTS AND RESULTS
personal or auto-biographical 5
politics 45 In experiments, we focused mostly on the perfor-
religion 16 mance the individual content-based features, and on
school and education 2 how much fusion of the metadata with the content-
sports 7 based features improves over the results achieved by
technology 27 using metadata alone. The metadata is in general very
the environment 1 informative as, for example, one uploader usually up-
the mainstream media 5 loads videos of only very small number of genres.
travel 4 The results of visual and audio features are shown
videoblogging 10 in Table 2. In the table, dense sampling is denoted as
web development and sites 3 DENSE16 for patch radius of 16 pixels and DENSE8
for radius of 8 pixels (see Section 2.1. HARLAP
Table 1: Genre categories and their distribution in training
set. stands for the Harris-Laplace detector. The descrip-
tors are denoted as SIFT and CSIFT for the RGB-
SIFT. The audio features are indicated by SPEC-
of their responses. The histograms consisted of 8 TRUM. TV11 represents the classifiers based on re-
equidistant bins with the outer bins set to 5% quan- sponses of TRECVID semantic detectors.
tiles. The dimension of the resulting feature vectors The results show that the semantic classifiers on
obtained by concatenating the histograms of individ- average provide better results. The same is true for
ual semantic classes was 2760. fusion of the the audio and video features where clas-
sifiers using directly the low-level features achieve
MAP 0.254 and those using semantic features achieve
0.276.
3 DATA Table 3 compares results of metadata, ASR tran-
scripts, fusion of content-based features, and fusion
The data released for MediaEval 2011 Genre Tag- of all features including metadata. Metadata by it-
ging Task (Larson et al., 2011) consist of Creative self gives MAP 0.405. On the other hand results of
Commons videos downloaded from blip.tv. The data ASR are much lower. With MAP 0.165 it is slightly
is split into separate training and testing sets. The worse than the individual audio and video features.
training set contains 247 videos and the test set con- By combining all content-based features MAP 0.304
tains 1728 videos. The genre categories and the dis- was achieved. This is still significantly lower than for
tribution of classes in the training set is shown in Ta- the metadata. However, it is important to realize that
ble 1. the content-based features do not include any infor-
The metadata includes information about the mation generated by humans. By combining meta-
videos supplied by uploaders of the video including data with the content-based features, the results are
among others title, description, uploader login name improved by 0.046 reaching MAP 0.451.
and tags. Four runs were submitted to MediaEval 2011
MAP of a random system on the test set should be using the presented approach with several differ-
0.046. ences (Hradis et al., 2011). The most notable differ-
Figure 2: Frames randomly selected from MediaEval 2011 genre tagging task data. The source videos are released under
Creative Commons license.

Features MAP The best result was MAP 0.56 (Rouvier and Linares,
All including metadata 0.451 2011). This result was reached by explicitly using IDs
Metadata 0.405 of the uploaders and the fact that uploaders tend to
All Content-based 0.304 upload similar videos. Other than that, the approach
ASR 0.165 classified the data by SVM on metadata, ASR tran-
Table 3: Mean average precision achieved by fusion of all scripts and audio and video features.
features, metadata alone, all content based features (audio,
video and ASR), and ASR alone.

Run MAP
RUN1 0.165
RUN3 0.346 5 CONCLUSIONS
RUN4 0.322
RUN5 0.360
Table 4: Mean average precision on test set achieved by the The presented genre recognition approach
runs submitted to MediaEval 2011. achieves good result on the datasets used in the
experiments. The results could be even considered
ence was that weights for the classifier fusion were surprisingly good considering the small size of the
set by hand. When fusing the audio and video fea- training set used. However, it is not certain how the
tures, uniform weights were used. RUN1 used only results would generalize to larger and more diverse
ASR. RUN3 combined all features with the weight of datasets.
ASR and METADATA increased to 2.5. RUN4 com- Although the metadata is definitely the most im-
bined the low-level audio and video features, ASR portant source of information for genre recognition,
and metadata. Here the weights of ASR and metadata the audio and video content features improve results
were set to 1.25. RUN5 combined semantic features, when combined with the metadata. Compared to the
ASR and metadata with the same weights as in RUN4. metadata, content-based features achieve worse re-
The results of these runs are show in Table 4. sults, but they do not require any human effort.
The best purely content-based method submitted
to MediaEval 2011 achieved MAP 0.121 (Ionescum The semantic features for classification improve
et al., 2011). Very successful were methods focus- over the low-level features individually, as well as,
ing on metadata and information retrieval methods. when combined.
ACKNOWLEDGEMENTS Mikolajczyk, K. (2004). Scale & Affine Invariant Interest
Point Detectors. International Journal of Computer
Vision, 60(1):63–86.
This work has been supported by the EU FP7
project TA2: Together Anywhere, Together Anytime Mikolajczyk, K. and Schmid, C. (2005). A Performance
Evaluation of Local Descriptors. IEEE Trans. Pattern
ICT-2007-214793, grant no 214793, and by BUT FIT Anal. Mach. Intell., 27(10):1615–1630.
grant No. FIT-11-S-2. Perronnin, F., Senchez, J., and Xerox, Y. L. (2010). Large-
scale image categorization with explicit data embed-
ding. In Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on, pages 2297–
REFERENCES 2304, San Francisco, CA.
Rouvier, M. and Linares, G. (2011). LIA @ MediaEval
Ayache, S. and Quénot, G. (2007). Evaluation of active 2011 : Compact Representation of Heterogeneous De-
learning strategies for video indexing. Signal Process- scriptors for Video Genre Classication. In MediaEval
ing: Image Communication, 22(7-8):692–704. 2011 Workshop, Pisa, Italy.
Ayache, S. and Quénot, G. (2008). Video Corpus Annota- Smeaton, A. F., Over, P., and Kraaij, W. (2009). High-Level
tion Using Active Learning. In Macdonald, C., Ounis, Feature Detection from Video in TRECVid: a 5-Year
I., Plachouras, V., Ruthven, I., and White, R., editors, Retrospective of Achievements. In Divakaran, A., ed-
Advances in Information Retrieval, volume 4956 of itor, Multimedia Content Analysis, Theory and Appli-
Lecture Notes in Computer Science, pages 187–198. cations, pages 151–174. Springer Verlag, Berlin.
Springer Berlin / Heidelberg.
Snoek, C. G. M., van de Sande, K. E. A., de Rooij, O., Hu-
Ayache, S., Quénot, G., and Gensel, J. (2006). CLIPS-LSR urnink, B., Gavves, E., Odijk, D., de Rijke, M., Gev-
Experiments at TRECVID 2006. In TREC Video Re- ers, T., Worring, M., Koelma, D. C., and Smeulders,
trieval Evaluation Online Proceedings. TRECVID. A. W. M. (2010). The MediaMill TRECVID 2010
Beran, V., Hradis, M., Otrusina, L., and Reznicek, I. (2011). Semantic Video Search Engine. In TRECVID 2010:
Brno University of Technology at TRECVid 2011. In Participant Notebook Papers and Slides.
TRECVID 2011: Participant Notebook Papers and van de Sande, K. E. A., Gevers, T., and Snoek, C. G. M.
Slides, Gaithersburg, MD, US. National Institute of (2010). Evaluating Color Descriptors for Object and
Standards and Technology. Scene Recognition. {IEEE} Transactions on Pattern
Brezeale, D. and Cook, D. J. (2008). Automatic Video Clas- Analysis and Machine Intelligence, 32(9):1582–1596.
sification: A Survey of the Literature. IEEE Transac- van Gemert, J. C., Veenman, C. J., Smeulders, A. W. M.,
tions on Systems Man and Cybernetics Part C Appli- and Geusebroek, J. M. (2010). Visual Word Ambigu-
cations and Reviews, 38(3):416–430. ity. PAMI, 32(7):1271–1283.
Cortes, C. and Vapnik, V. (1995). Support-Vector Networks. You, J., Liu, G., and Perkis, A. (2010). A semantic frame-
Machine Learning, 20(3):273–297. work for video genre classification and event analysis.
Gauvain, J.-L., Lamel, L., and Adda, G. (2002). The LIMSI Signal Processing Image Communication, 25(4):287–
Broadcast News transcription system. Speech Com- 302.
munication, 37(1-2):89–108.
Hradis, M., Reznicek, I., and Behun, K. (2011). Brno Uni-
versity of Technology at MediaEval 2011 Genre Tag-
ging Task. In Working Notes Proceedings of the Me-
diaEval 2011 Workshop, Pisa, Italy.
Ionescum, B., Seyerlehner, K., Vertan, C., and Lamber, P.
(2011). Audio-Visual Content Description for Video
Genre Classication in the Context of Social Media. In
MediaEval 2011 Workshop, Pisa, Italy.
Larson, M., Eskevich, M., Ordelman, R., Kofler, C.,
Schmeideke, S., and Jones, G. J. F. (2011). Overview
of MediaEval 2011 Rich Speech Retrieval Task and
Genre Tagging Task. In MediaEval 2011 Workshop,
Pisa, Italy.
Le, Q. V., Zou, W. Y., Yeung, S. Y., and Ng, A. Y. (2011).
Learning hierarchical invariant spatio-temporal fea-
tures for action recognition with independent sub-
space analysis. Learning, pages 1–4.
Lowe, D. G. (1999). Object Recognition from Local Scale-
Invariant Features. In ICCV ’99: Proceedings of the
International Conference on Computer Vision-Volume
2, page 1150, Washington, DC, USA. IEEE Computer
Society.

Survey On Scene Classification Approaches
No ratings yet
Survey On Scene Classification Approaches
6 pages
Classifying YouTube Channels
No ratings yet
Classifying YouTube Channels
42 pages
A Contour-color-Action Approach To Automatic Retrieval of Several Common Video Genres
No ratings yet
A Contour-color-Action Approach To Automatic Retrieval of Several Common Video Genres
15 pages
Video Genre Verification Using Both Acoustic and Visual Modes
No ratings yet
Video Genre Verification Using Both Acoustic and Visual Modes
4 pages
The State of The Art in Image and Video Retrieval
No ratings yet
The State of The Art in Image and Video Retrieval
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Video Classification:A Literature Survey: Pravina Baraiya Asst. Prof. Disha Sanghani
No ratings yet
Video Classification:A Literature Survey: Pravina Baraiya Asst. Prof. Disha Sanghani
5 pages
EAI Endorsed Transactions: Automatic Video Classification: A Review
No ratings yet
EAI Endorsed Transactions: Automatic Video Classification: A Review
5 pages
High-Level Concept Detection in Video
No ratings yet
High-Level Concept Detection in Video
11 pages
Concept-Based Video Retrieval: by Cees G. M. Snoek and Marcel Worring
No ratings yet
Concept-Based Video Retrieval: by Cees G. M. Snoek and Marcel Worring
110 pages
TRECVID Chapter Complete
No ratings yet
TRECVID Chapter Complete
22 pages
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
No ratings yet
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
4 pages
Genre Classification of Web Pages: - User Study and Feasibility Analysis
No ratings yet
Genre Classification of Web Pages: - User Study and Feasibility Analysis
15 pages
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
From Everand
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
Fouad Sabry
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Artemis
No ratings yet
Artemis
11 pages
How Many High-Level Concepts Will Fill The Semantic Gap in Video Retrieval?
No ratings yet
How Many High-Level Concepts Will Fill The Semantic Gap in Video Retrieval?
8 pages
Extargerea Semanticii PDF
No ratings yet
Extargerea Semanticii PDF
31 pages
Braving the Semantic Gap Mapping Visual Concepts from Images and Videos 1st Edition by Da Deng 9783540240549 download
No ratings yet
Braving the Semantic Gap Mapping Visual Concepts from Images and Videos 1st Edition by Da Deng 9783540240549 download
44 pages
Combining_visual_and_acoustic_features_for_music_g
No ratings yet
Combining_visual_and_acoustic_features_for_music_g
38 pages
Book Chapter-Ip
No ratings yet
Book Chapter-Ip
21 pages
Asso for Info Science Tech - 2021 - Matthews - Genre analysis of movies using a topic model of plot summaries
No ratings yet
Asso for Info Science Tech - 2021 - Matthews - Genre analysis of movies using a topic model of plot summaries
17 pages
An Ontology Framework For Knowledge-Assisted Semantic Video Analysis and Annotation
No ratings yet
An Ontology Framework For Knowledge-Assisted Semantic Video Analysis and Annotation
10 pages
VideoMiningBook Chapter7 PDF
No ratings yet
VideoMiningBook Chapter7 PDF
40 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Evaluating bag-of-visual-words representations in scene classific
No ratings yet
Evaluating bag-of-visual-words representations in scene classific
11 pages
Semantic Hierarchies For Image Annotation - A Survey
No ratings yet
Semantic Hierarchies For Image Annotation - A Survey
41 pages
evaluation of feature extractors and psycho-acoustic transformations for music genre classifications
No ratings yet
evaluation of feature extractors and psycho-acoustic transformations for music genre classifications
8 pages
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
Braving the Semantic Gap Mapping Visual Concepts from Images and Videos 1st Edition by Da Deng 9783540240549 - The ebook in PDF format is available for download
100% (2)
Braving the Semantic Gap Mapping Visual Concepts from Images and Videos 1st Edition by Da Deng 9783540240549 - The ebook in PDF format is available for download
50 pages
1 s2.0 S0167923617300994 Main
No ratings yet
1 s2.0 S0167923617300994 Main
11 pages
VideoTextWACV
No ratings yet
VideoTextWACV
8 pages
Movie Genre Classification
No ratings yet
Movie Genre Classification
5 pages
IJCRT24A5009
No ratings yet
IJCRT24A5009
5 pages
Image Search Engines
100% (1)
Image Search Engines
54 pages
Content-Based Image Retrieval by Ontology-Based Object Recognition
No ratings yet
Content-Based Image Retrieval by Ontology-Based Object Recognition
10 pages
Bag of Features
No ratings yet
Bag of Features
49 pages
Video PDF
No ratings yet
Video PDF
9 pages
Stephen W. Smoliar, Hongliang Zhang - Content Based Video Indexing and Retrieval (1994)
No ratings yet
Stephen W. Smoliar, Hongliang Zhang - Content Based Video Indexing and Retrieval (1994)
11 pages
6454-31248-1-PB
No ratings yet
6454-31248-1-PB
7 pages
A Survey On Visual Content-Based Video Indexing and Retrieval
No ratings yet
A Survey On Visual Content-Based Video Indexing and Retrieval
23 pages
Learn OpenCV with Python by Examples
From Everand
Learn OpenCV with Python by Examples
James Chen
No ratings yet
Music Information Retrieval: Figure 1.1: An Enumeration of Some Tasks in The General Field of Music Information
No ratings yet
Music Information Retrieval: Figure 1.1: An Enumeration of Some Tasks in The General Field of Music Information
14 pages
Video Shot Boundary Detection Using Block Based Cumulative Approach
No ratings yet
Video Shot Boundary Detection Using Block Based Cumulative Approach
24 pages
Sikos, Leslie F - Description Logics in Multimedia Reasoning-Springer (2017)
No ratings yet
Sikos, Leslie F - Description Logics in Multimedia Reasoning-Springer (2017)
215 pages
A Survey On The Automatic Indexing of Video Data 1999 Journal of Visual Communication and Image Representation
No ratings yet
A Survey On The Automatic Indexing of Video Data 1999 Journal of Visual Communication and Image Representation
35 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Image Graphics
No ratings yet
Image Graphics
26 pages
Sports Video Annotation Using Enhanced HSV Histograms in Multimedia Ontologies
No ratings yet
Sports Video Annotation Using Enhanced HSV Histograms in Multimedia Ontologies
8 pages
Content-Based Filtering For Video Sharing Social Networks
No ratings yet
Content-Based Filtering For Video Sharing Social Networks
28 pages
Video Text Detection by Tong Lu, Shivakumara Palaiahnakote, Chew Lim Tan
No ratings yet
Video Text Detection by Tong Lu, Shivakumara Palaiahnakote, Chew Lim Tan
272 pages
Information Retrieval Beyond The Text Document
No ratings yet
Information Retrieval Beyond The Text Document
34 pages
Video Content Representation With Grammar For Semantic Retrieval
No ratings yet
Video Content Representation With Grammar For Semantic Retrieval
6 pages
Scene recognition by semantic visual words
No ratings yet
Scene recognition by semantic visual words
20 pages
Content-Based Image Retrieval at The End of The Early Years
No ratings yet
Content-Based Image Retrieval at The End of The Early Years
32 pages
Content-Based Image Retrieval at The End of The Early Years
No ratings yet
Content-Based Image Retrieval at The End of The Early Years
32 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Semantics-Based Image Retrieval by Region Saliency
No ratings yet
Semantics-Based Image Retrieval by Region Saliency
9 pages
Cloud Computing Reference Model PDF
No ratings yet
Cloud Computing Reference Model PDF
3 pages
Transforming Dental Health in Rural Communities Digital Dentistry Accessible DOCX Download
100% (6)
Transforming Dental Health in Rural Communities Digital Dentistry Accessible DOCX Download
15 pages
SSC + OSSC - CGL (Syllabus)
No ratings yet
SSC + OSSC - CGL (Syllabus)
5 pages
Fiori Presentation
No ratings yet
Fiori Presentation
3 pages
EXAM 12 - Icpna
No ratings yet
EXAM 12 - Icpna
10 pages
074-446-P1J XTC3 Om PDF
No ratings yet
074-446-P1J XTC3 Om PDF
216 pages
Ace Up The Sleeve - AD DACL Backdoor
No ratings yet
Ace Up The Sleeve - AD DACL Backdoor
60 pages
SNEHA KACHARE (Resume)
No ratings yet
SNEHA KACHARE (Resume)
2 pages
Basic College Mathematics through Applications 5th Edition Akst Solutions Manual instant download
100% (4)
Basic College Mathematics through Applications 5th Edition Akst Solutions Manual instant download
42 pages
ITIL® 4 Foundation Exam Full Preparation ITIL® 4 Foundation Exam, Latest Version by Daccache, Georgio
No ratings yet
ITIL® 4 Foundation Exam Full Preparation ITIL® 4 Foundation Exam, Latest Version by Daccache, Georgio
52 pages
Manual Sonometro PDF
No ratings yet
Manual Sonometro PDF
11 pages
Cloud Case Study
No ratings yet
Cloud Case Study
4 pages
2017 - SCL (2nd ed) - [Corbett] (2017)
No ratings yet
2017 - SCL (2nd ed) - [Corbett] (2017)
6 pages
Poking A Hole in Firewall ProofRead Final
No ratings yet
Poking A Hole in Firewall ProofRead Final
16 pages
Namespace SAP
No ratings yet
Namespace SAP
12 pages
Stored Procedures: Page 1 of 5
No ratings yet
Stored Procedures: Page 1 of 5
5 pages
Cdweb Control For Proflow: System Manual
No ratings yet
Cdweb Control For Proflow: System Manual
169 pages
TAFJ Distribution
100% (1)
TAFJ Distribution
11 pages
Data Sheet - Series 7000 Meter DS
No ratings yet
Data Sheet - Series 7000 Meter DS
6 pages
FHA User Manual
No ratings yet
FHA User Manual
126 pages
B360 July2023 (SAP) ExtensionForCustomer360 en
No ratings yet
B360 July2023 (SAP) ExtensionForCustomer360 en
89 pages
CCNA4 Case Study Inst en
No ratings yet
CCNA4 Case Study Inst en
9 pages
Reviewer Finals For Purposive Communication
No ratings yet
Reviewer Finals For Purposive Communication
15 pages
Java Complete CheatSheet
No ratings yet
Java Complete CheatSheet
20 pages
Ats 385 Manual en
No ratings yet
Ats 385 Manual en
36 pages
Acosta Jillian Rechelle Acosta Thesisfinal
No ratings yet
Acosta Jillian Rechelle Acosta Thesisfinal
112 pages
Ip PBX and Unified Communications System
No ratings yet
Ip PBX and Unified Communications System
13 pages
EvaluationQuickStartGuide
No ratings yet
EvaluationQuickStartGuide
8 pages
Advanced Industrial Automation PLC programming in simplest way with 110 solved examples 1st Edition Himanshu Kumar instant download
100% (3)
Advanced Industrial Automation PLC programming in simplest way with 110 solved examples 1st Edition Himanshu Kumar instant download
64 pages
BCA 2nd Sem - Prog in C
No ratings yet
BCA 2nd Sem - Prog in C
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Semantic Class Detectors in Video Genre Recognition

Uploaded by

Semantic Class Detectors in Video Genre Recognition

Uploaded by

SEMANTIC CLASS DETECTORS IN VIDEO GENRE

Michal Hradiš 1 , Ivo Řeznı́ček1 and Kamil Behúň1

1 INTRODUCTION oped for MediaEval 2011 Genre Tagging Task (Lar-

which it was not trained on. This way, no knowledge 3 http://mrim.imag.fr/tvca/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.