دانلود ترجمه مقاله درگاه قابل برنامه ریزی فیلد، تحمل اشتباه و قابلیت اعتماد



دانلود رایگان مقاله انگلیسی + خرید ترجمه فارسی
عنوان فارسی مقاله: درگاه قابل برنامه ریزی فیلد، تحمل اشتباه و قابلیت اعتماد
عنوان انگلیسی مقاله: Fault tolerance and reliability in field programmable gate arrays
دانلود مقاله انگلیسی: برای دانلود رایگان مقاله انگلیسی با فرمت pdf اینجا کلیک نمائید


مشخصات مقاله انگلیسی (PDF)  و ترجمه مقاله (Word)
سال انتشار مقاله  ۲۰۱۰
تعداد صفحات مقاله انگلیسی ۱۶ صفحه با فرمت pdf
تعداد صفحات ترجمه مقاله ۳۰ صفحه با فرمت ورد
رشته های مرتبط  فناوری اطلاعات، کامپیوتر، مهندسی برق و الکترونیک
مجله مربوطه  موسسه فنی و مهندسی (The Institution of Engineering and Technology)
دانشگاه تهیه کننده  دپارتمان مهندسی برق و الکترونیک، کالج امپریال، لندن، انگلستان (Department of Electrical and Electronic Engineering, Imperial College, London, UK)
کلمات کلیدی این مقاله  قابلیت اطمینان، تحمل پذیری در برابر خطا، درگاه قابل برنامه ریزی فیلد
نشریه   کامپیوتر هایIET و تکنیک های دیجیتال (IET Computers & Digital Techniques)



بخشی از ترجمه:


کاهش قابلیت اطمینان در سطح دستگاه و افزایش تنوع در پروسه های within-die را می توان از مباحث بسیار مهم برای آرایه های درگاه با قابلیت برنامه ریزی-فیلد(FPGA) دانست که منجر به توسعه ی پویای خطاها در طول چرخه ی عمر مدار ادغام یافته میشود. خوشبختانه، FPGA ها توانایی پیکربندی مجدد در فیلد را در زمان اجرا دارند و از ان رو فرصت هایی را به منظور غلبه بر چنین خطاهایی فراهم می سازند. این طالعه یک بررسی جامع بر روی متد های تشخیص خطا و شِماهای تحمل پذیری در برابر خطا را برای FPGA ها و تنزل دستگاه ها و با هدف ایجاد یک مبنای قوی برای پژوهش های آینده در این حوزه ارائه میدهد. همه ی متد ها و شِماها از نظر کمی مقایسه شده اند و بعضی از آنها نیز مورد تأکید قرار گرفته اند.

بخشی از مقاله انگلیسی:


Abstract: Reduced device-level reliability and increased within-die process variability will become serious issues for future field-programmable gate arrays (FPGAs), and will result in faults developing dynamically during the lifetime of the integrated circuit. Fortunately, FPGAs have the ability to reconfigure in the field and at runtime, thus providing opportunities to overcome such degradation-induced faults. This study provides a comprehensive survey of fault detection methods and fault-tolerance schemes specifically for FPGAs and in the context of device degradation, with the goal of laying a strong foundation for future research in this field. All methods and schemes are quantitatively compared and some particularly promising approaches are highlighted. 1 Introduction As process technology scaling continues, integrated circuits face greater challenges from defects, process variability and reliability. Field-programmable gate arrays (FPGAs) are no exception to this; one recent study suggested defect tolerance will be necessary in future large FPGAs at and beyond the 45 nm technology node [1]. FPGAs have some key advantages over application specific integrated circuits (ASICs) for achieving fault tolerance. Firstly, they are (mostly) composed of regular arrays of generic resources, giving them inherent redundancy. Secondly, they can be reconfigured in the field. These have been exploited in a wealth of research and some promising fault tolerant systems have been developed. There have been different motivations for designs of fault tolerant FPGA systems. Early work was concentrated on increasing manufacturing yield through defect tolerance and some of this has found its way into commercial use [2]. The advent of SRAM FPGAs presented the problem of singleevent upsets (SEUs), which are sporadic flips of configuration bits causing connectivity, logic and state errors. This has also lead to a great deal of research, the benefits of which can be widely found in space and nuclear applications [3]. The focus of this study, however, is on work relating to the reliability of FPGAs and in-field tolerance of permanent faults that are caused by device degradation. This is a less established field of research, although techniques developed in defect and SEU tolerance schemes are highly relevant to degradation fault tolerance. This aspect of fault tolerance is set to become increasingly important with the continuing development of silicon technology. This paper is based on material previously published by the authors in [4]. It is extended with new sections on fault modelling and future work in the field. Existing sections are discussed at greater depth, with 17 additional papers surveyed and eight new figures. A fault tolerant system consists of two main components; these are fault detection and fault repair. Section 3 surveys fault detection methods and Section 4 considers fault repair. Causes of faults, modelling and application issues are discussed in Section 2 and the possibilities for future development of the field are explored in Section 5. 2 Background 2.1 Causes of degradation Degradation is the permanent deterioration of a circuit over time, resulting in a negative impact on performance. The effects can be progressive, a gradual change of a circuit parameter or catastrophic, a sudden onset of a failed state in a circuit component. Degradation in VLSI circuits can be 196 IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 196– ۲۱۰ & The Institution of Engineering and Technology 2010 doi: 10.1049/iet-cdt.2009.0011 www.ietdl.org attributed to a number of mechanisms [5]. The hot-carrier effect leads to a build up of trapped charges in the gatechannel interface region [6]. This causes a gradual reduction in channel mobility and increase in threshold voltage in CMOS transistors. The effect on the circuit is that switching speeds become slower, leading to delay faults. Negative-bias temperature instability (NBTI) has similar consequences for circuits and is also caused by a build up of trapped charges [7]. Electromigration is a mechanism by which metal ions migrate over time leading to voids and deposits in interconnects. Eventually, these can cause faults because of the creation of open and short circuits [8]. Time-dependent dielectric breakdown (TDDB) affects the gates of transistors, causing an increase in leakage current and eventually a short circuit. The mechanism here is the creation of charge traps within the gate dielectrics, diminishing the potential barrier it forms [9, 10]. All of these degradation mechanisms have the potential to become more severe with the shrinking of process geometry. This is due to increasing gate field strength, higher current density, smaller feature size, thinner gate dielectrics and increasing variability [11]. In the case of TDDB, the situation is made complicated by the introduction of new processes such as high-K dielectrics and metal gates [12]. 2.2 Other types of fault In addition to degradation, there are two other types of faults that can affect FPGAs. These are relevant to this study as some of the techniques that have been developed in response to them can also be applied to faults caused by degradation. The first of these is manufacturing defects. Manufacturing defects can be exhibited as circuit nodes which are stuck-at 0 or 1 or switch too slowly to meet the timing specification. Defects also affect the interconnect network and can cause short or open circuits and stuck open or closed pass transistors [13]. Test of manufacturing defects is well established in VLSI and defect tolerance techniques are currently used in some types of device, including FPGAs [2], to increase yield. The second class of fault which is widely discussed in relation to FPGAs comprises of SEUs and single event transients SETs, caused by certain types of radiation [14]. This is of particular concern to aviation, nuclear research and space applications where devices are exposed to higher levels of radiation and high levels of reliability are required. The most commonly considered failure mode is the flipping of an SRAM cell in the configuration memory, leading to an error in the logic function that persists until the configuration is refreshed in a process known as scrubbing. Although this recovery method is not applicable to permanent faults caused by degradation, ways of detecting SEU faults are relevant. 2.3 Modelling of faults In order to effectively detect, locate and repair faults a model is needed of how they affect the circuit. Fault modelling has several aspects including (a) determining which fault mechanisms may occur; (b) simulating the effect that possible faults will have on the system; (c) predicting the rate and distribution of failures; and (d) establishing fault scenarios for evaluating potential repair strategies. Faults can be modelled at different layers of the FPGA, as shown in Fig. 1. Although faults occur in the silicon structures which make up transistors and interconnect, fault tolerant systems deal with them at various levels of abstraction. A repair at each level of abstraction aims to be transparent to the level above it. Logic: A low-level approach considers the underlying logic of the FPGA and models faults on particular circuit nets. Fabric: Some fault tolerant systems consider faults in the FPGA fabric, that is the set of LUTs, registers, interconnect and so on that is available to the designer [15]. This has the advantage that these elements are easy to test with reconfiguration and BIST, though the behaviour of the configuration logic is obscured. Array: A popular option is to consider the FPGA at an array level, that is to mark off entire clusters or interconnect lines as faulty. This best exploits the regular structure of FPGAs. Application: A higher level of abstraction is possible when the application is modular and adaptable. This allows the fault model to extend to other parts of the circuit outside the FPGA for a very robust system. Figure 1 Design of an FPGA and its application can be abstracted to several levels Fault modelling and tolerance can be approached at numerous points in the hierarchy IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 196– ۲۱۰ ۱۹۷ doi: 10.1049/iet-cdt.2009.0011 & The Institution of Engineering and Technology 2010 www.ietdl.org Within this study, fault repair is defined to be the repair of a faulty system so that it returns to being fully operational. Invariably, at some level of the FPGA this repair is achieved by the replacement of a failed component with a functional one. The size and nature of the replaced component varies from scheme to scheme and this represents the granularity of the approach. All of the studies surveyed here fall into the fabric-level, array-level or application-level categories. Approaching fault tolerance at different levels of abstraction places the burden of dealing with them on different parties over the design, manufacturing and service phases of the product lifetime. A fabric-level repair, for example, may be completely transparent to the engineer who designs the application circuit and requires no alteration of the configuration data. On the other hand, an application-level strategy is likely to be embedded into the system design and be tailored to the application. An important part of fault modelling is to determine the possible failure modes at the design level under consideration. At the circuit level, the simplest of fault models assumes that faulty circuit nodes can be either stuck at 0, stuck at 1, shorted to another node or an open circuit [13]. Although these hard-failure modes have been an effective approach to defect testing for a long period, worsening process variation and degradation require marginal and timing faults to be considered [16]. Since some of the VLSI wear-out mechanisms are progressive in nature, marginal faults are likely to be more prevalent in field failures than in failures because of manufacturing defects. Examples of marginal faults include slow switching, intermittent switching, weak drivers and unstable registers. Another aspect to a degradation fault model is the rate at which faults occur and how this varies over time [5]. Traditionally, a bathtub curve of failures is described for VLSI circuits, in common with many other manufacturing processes. High numbers of ‘infant mortality’ failures occur shortly after manufacture, then the failure rate remains low until the end of the design life. Greater process variation and degradation will make these phases less distinct, for example a significant background rate of failure may be observed over the entire life of the product [11]. This is illustrated in Fig. 2. FPGA systems can be reconfigured in the field, either to change their functionality or as required by a fault-tolerance system. This raises the possibility of dormant faults, faults that occur on resources which are unused when the fault occurs but which may be used in the future. The implication of this is that multiple faults may become apparent on reconfiguration and the system must be prepared for this. 2.4 Applications of fault tolerance All of the fault detection and repair methods surveyed have individual strengths and weaknesses and which method is most appropriate depends on the application. In some cases, reliability is critical for safety or mission success. For example, an automotive application was discussed at a system level by Steininger et al. [17]. Fast detection and/ or error correction is crucial here so that erroneous data or state is not acted upon, which could be hazardous. A widely implemented application of fault tolerance in FPGAs is in space missions. Traditionally, this is because (a) they experience significant numbers of SEUs caused by increased radiation; (b) the breakdown of an electronic system could cause the mission to be lost; and (c) manual repair is impractical. In the light of variability and reliability concerns associated with future VLSI process nodes, it may become economical to use fault tolerance in general purpose, high-volume applications. In this case, it will be important that the detection and repair method has the lowest possible overhead on timing performance and area. Such applications may be able to compromise data integrity and fault coverage to achieve this, for example infrequent visible errors and a small proportion of returns would be tolerable in a consumer video decoder.d


دانلود رایگان مقاله انگلیسی + خرید ترجمه فارسی
عنوان فارسی مقاله: درگاه قابل برنامه ریزی فیلد و تحمل اشتباه و قابلیت اعتماد
عنوان انگلیسی مقاله: Fault tolerance and reliability in fieldprogrammable gate arrays



ارسال دیدگاه

نشانی ایمیل شما منتشر نخواهد شد.