دانلود ترجمه مقاله روش جدید تخمین رنج گام در جداسازی سیگنال گفتار تک کاناله در فرکانس مدولاسیون – مجله اسپرینگر

عنوان فارسی مقاله:	جداسازی سیگنال گفتار تک کاناله در حوزه فرکانس مدولاسیون بر اساس روش جدید تخمین رنج گام
عنوان انگلیسی مقاله:	Single channel speech separation Processing in modulation frequency domain based on a novel pitch range estimation method
دانلود مقاله انگلیسی:	برای دانلود رایگان مقاله انگلیسی با فرمت pdf اینجا کلیک نمائید

سال انتشار	۲۰۱۲
تعداد صفحات مقاله انگلیسی	۱۰ صفحه
تعداد صفحات ترجمه مقاله	۲۶ صفحه
مجله	مجله پیشرفتهای پردازش سیگنال
دانشگاه	دانشکده مهندسی برق و کامپیوتر دانشگاه یزد
کلمات کلیدی	–
نشریه اسپرینگر	Springer

فهرست مطالب:

چکیده
۱ مقدمه
۲ تجزیه و تحلیل فرکانس مدولاسیون
۳ توصیف سیستم

۱ ۳ تجزیه T-F و تبدیل مدولاسیون
۲ ۳ تخمین رنج گام در حوزه فرکانس مدولاسیون
۳ ۳ جداسازی گفتار

۴ ارزیابی

۱ ۴ تخمین رنج گام
۲ ۴ جداسازی گفتار صدادار

۵ بحث و نتیجه گیری

بخشی از ترجمه:

مقدمه

جداسازی گفتار، به عنوان راه حلی برای مسئله کوکتل پارتی، چالشی معروف با مبانی و اثرات مهم قلمداد می گردد. به منظور نیل به این نقطه، سیستم های مخابراتی یا سیستم های بازشناسی گفتار خودکاری را در نظر بگیرید که در حضور صداهای مزاحم، درست عمل نمی کنند. یک سیستم موثر که در شرایط تک صوتی یا یک صدایی (یک میکروفون) گفتار را از تداخلات تفکیک می کند، به حل این مشکلات کمک زیادی می کند. روشهای زیادی برای بهبود گفتار تک صوتی پیشنهاد شده است؛ به طور مثال، به مرجع مراجعه کنید. این قبیل روشها معمولاً خصوصیات آماری خاصی برای تداخل فرض کرده و فاقد ظرفیت لازم برای رسیدگی به انواع و اقسام تداخلات می باشند. اگرچه جداسازی گفتار تک صوتی عملکرد درستی از خود به معرض نمایش نمی گذارد، اما سیستم شنوایی انسان ، این کار را به گونه ای موثر و کارآمد انجام می دهد. فرایند ادراکی در قالب تجزیه و تحلیل شنوایی صحنه (ASA) در نظر گرفته شده است. تحقیق روانشناسی صوتی در زمینه ASA الهام بخش کار زیادی در زمینه توسعه سیستم های CASA برای جداسازی گفتار بوده است (برای مرور جامع به مراجع مراجعه کنید). بر طبق اظهارات برگمن، روش ASA را می توان به دو مرحله نظری تقسیم نمود: قطعه بندی و گروه بندی. در مرحله اول، گفتار به فضای ابعادی بالاتر (نظیر نمایش دو بعدی فرکانس – زمان) تبدیل شده و سپس، به منظور شکل گیری نواحی مختلف، واحدهای فرکانس- زمان مشابه (T-F) قطعه بندی شده اند . در مرحله دوم، این نواحی بر اساس اطلاعات صوتی وابسته، در جریانات مختلف باهم ترکیب می شوند. هدف اصلی محاسبه CASA ، تفکیک سیگنال گفتار هدف از تداخل برای مصارف مختلف از طریق ماسک نرم T-F یا دودویی می باشد، برای کسب اطلاعات بیشتر به مراجع مراجعه کنید.

بخشی از مقاله انگلیسی:

Introduction

Speech separation, as a solution to the cocktail partyproblem, is a well-known challenge with importantapplications. To touch the point, consider the telecommunicationsystems or the Automatic Speech Recognitionsystems that lose performance in the presence ofinterfering sounds [1,2]. An effective system that segregatesspeech from interference in monaural (singlemicrophone)situations can be rewarding in such problems.Many methods have been proposed for monauralspeech enhancement; for example, see [3-7]. Thesemethods usually assume certain statistical properties forinterference and tend to lack the capacity of dealingwith a variety of interferences. While the monauralspeech separation works awkwardly, the human auditorysystem performs proficiently. The perceptual process isconsidered as Auditory Scene Analysis (ASA) [5]. Psychoacousticresearch in ASA has inspired considerable work in developing Computational Auditory Scene Analysis(CASA) systems for speech separation (see [6,7] fora comprehensive review).According to Bregman [5], ASA procedure can beseparated into two theoretical stages: segmentation andgrouping. At the first stage, speech is transformed into ahigher-dimensional space (such as a time-frequencytwo-dimensional representation) and then, similar timefrequency(T-F) units are segmented in order to composedifferent regions [6]. In the second stage, theseregions are combined into different streams based onthe relevant acoustic information. The major computationalgoal of CASA is to separate the target speech signalfrom the interference for different purposes, viagenerating a binary or a soft T-F mask, see, e.g., [8-10].Grouping, itself, consists of simultaneous and sequentialorganizations, which involves grouping of segmentsacross frequency and time. The task of sequential groupingis to group the T-F regions relative to the samesound source across time. Figure 1 illustrates this issue inwhich the upper panel shows T-F regions grouped intoone single stream, as they are close enough in both (time and frequency) directions; while, the lower panel illustratesthe case of two streams of speech, grouped separatelyas the T-F regions are sufficiently far from eachother in the frequency direction. Temporal continuity isan effective cue for grouping T-F regions neighboring intime. However, it cannot handle T-F regions that do notoverlap in time due to the silence or interference segments.Therefore, sequential grouping of such T-Fregions is a very challenging problem (see [11,12] formore details).Natural speech includes both voiced and unvoicedportions. Voiced portions of speech are described byperiodicity (or harmonicity), which has been used as animportant feature in many CASA systems for segregatingvoiced speech (see, e.g. [13,14]). Despite considerableadvances in voiced speech separation, the performanceof current CASA systems is still limited by pitch frequency(F0) estimation errors and residual noise. Variousmethods have been proposed for robust pitchfrequency estimation, see e.g., [15,16]; however, robustpitch frequency estimation in low signal-to-noise ratio(SNR) situations still poses a significant challenge.While mixed speech may have a great deal of overlap inthe time domain, modulation frequency analysis providesan additional dimension that can present a greater degreeof separation among sources. In other words, the originalT-F representation obtained from transformations likeShort-Time Fourier Transform (STFT) can be augmented to a third dimension that represents modulation frequency.In [17], by assuming that the pitch frequencyrange is known and this range is constant in each filterchannel, the modulation spectral analysis is used as atool for producing the mask for speech separation ahigher-dimensional spaces.Based on the above observations, we propose a newsystem for single channel separation of voiced speechbased on the modulation filtering. The idea is that, first,the target pitch (frequency) range is estimated in themodulation frequency domain, and then, this range isused for producing the proper mask for speech separation.Because of the following reasons provided in [18],modulation analysis and filtering are applied for the target speech separation problem. First, there is a generalbelief stating that the human ASA system processesthe sounds in the modulation frequency domain. Second,the energy from two co-channel talkers is largelynon-overlapping in the modulation frequency domain.The method of modulation analysis and filtering hasextensively been studied by many researchers in thefield of single channel speech separation; Reference [19]provides a general discussion on this subject.

عنوان فارسی مقاله:	جداسازی سیگنال گفتار تک کاناله در حوزه فرکانس مدولاسیون بر اساس روش جدید تخمین رنج گام
عنوان انگلیسی مقاله:	Single channel speech separation Processing in modulation frequency domain based on a novel pitch range estimation method

دانلود رایگان مقاله انگلیسی

خرید ترجمه فارسی مقاله با فرمت ورد

خرید نسخه پاورپوینت این مقاله جهت ارائه