Boewie wichislitel'nie kompleksi (продолжение)

Форум » Дискуссии » Boewie wichislitel'nie kompleksi (продолжение) » Ответить

Boewie wichislitel'nie kompleksi (продолжение)

milstar: http://drops.dagstuhl.de/opus/volltexte/2006/732/pdf/06141.AthanasPeter.Paper.732.pdf Although an FPGA’s clock rate rarely exceeds one-tenth that of a PC, hardware implemented digital filters can process data at ###################################################################### many times that of software implementations [4] ################################### . Additional performance gains have been described for cryptography [5], network packet filtering [6], target recognition [7] and pattern matching [8], among other ########################################################################## applications. A. Present Day Cost-Performance Comparison Owing to the prevalence of IEEE standard floating-point in a wide range of applications, several researchers have designed IEEE 754 compliant floating-point accelerator cores constructed out of the Xilinx Virtex-II Pro FPGA’s configurable logic and dedicated integer multipliers [16-18]. Dou et al published one of the highest performance benchmarks of 15.6 GFLOPS by placing 39 floating-point processing elements on a theoretical Xilinx XC2VP125 FPGA [19]. Interpolating their results for the largest production Xilinx Virtex-II Pro device, the XC2VP100, produces 12.4 GFLOPS, compared to the peak 6.4 GFLOPS achievable for a 3.2 GHz Intel Pentium processor. Assuming that the Pentium can sustain 50% of its peak, the FPGA outperforms the processor by a factor of four for matrix multiplication. One of the earlier projects demonstrated a 23x speedup on a 2-D FFT through the use of a custom 18-bit floating-point format [26]. More recent work has focused on parameterizible libraries of floating-point units that can be tailored to the task at hand [27-29]. By using a custom floating-point format sized to match the width’s of the FPGA’s internal integer multipliers, a speedup of 44 was achieved for a hydrodynamics simulation [30] using four large FPGAs. Nakasato and Hamada’s 38 GFLOPS of performance is impressive, even from a cost-performance standpoint. For the cost of their PROGRAPE-3 board, estimated at $15,000, it is likely that a 15-node processor cluster could be constructed producing 196 single precision peak GFLOPS. Even in the unlikely scenario that this cluster could sustain the same 10% of peak performance obtained by Nakasato and Hamada’s for their software implementation, the PROGRAPE-3 design would still achieve a 2x speedup. As in many FPGA to CPU comparisons, it is likely that the analysis unfairly favors the FPGA solution. Hardware implementations require specialized skills in digital design and vendor-specific tool flows. Development time and costs are significantly higher than for software. Many comparisons in literature spend significantly more time optimizing the hardware implementations than they do optimizing their software implementations. Previous research has demonstrated significant compiler inefficiency for common HPCfunctions [31]. For the DGEMM matrix multiplication function, a hand-coded version outperformed the ############################################### compiler by greater than eight times. ############################ A to- tal of 39 PEs can be integrated into the xc2vp125-7 FPGA, reaching performance of, e.g., 15.6 GFLOPS with 1600 KB local memory and 400 MB/s external memory bandwidth 1 is s 1700 nozkami i wisokoj stoimost'ju porjadka 8000 $ segodnja http://ce.et.tudelft.nl/~george/publications/Conf/FPGA05/FPGA05Dou.pd http://www.xilinx.com/publications/matrix/virtexmatrix.pd Xilinx Vertex FPGA

Ответов - 163, стр: 1 2 3 4 5 6 7 8 9 All

milstar: На российский рынок во II квартале 2020 г. было поставлено 32 941 серверов всех типов, что на 2,8% больше показателей за аналогичный период прошлого года, сообщили аналитики IDC Russia со ссылкой на данные исследования IDC EMEA Quarterly Server Tracker. В денежном выражении объем квартальных продаж серверов в России составил $262,38 млн, что на 15,9% больше, чем во II квартале 2019 г. Для сравнения: в I квартале 2020 г. было продано 30 767 серверов на сумму $231,16 млн. В последнем квартале 2019 г. в России было продано 48 260 серверов на сумму $362,85. По итогам всего 2019 г. российский серверный рынок впервые перешагнул отметку в миллиард долларов ($1,035 млрд), увеличившись за год на 7,6%. https://www.cnews.ru/news/top/2020-09-10_rossijskij_servernyj_rynok

milstar: https://kit-e.ru/build_in_systems/vysokoproizvoditelnye-mnogoyadernye-proczessory-dlya-vstraivaemyh-sistem/ Процессор CSX700 Архитектура процессора CSX700 была разработана для решения так называемой проблемы массо-габаритных показателей и потребляемой мощности (Size, Weight and Power (SWAP)), которая, как правило, является основной для встраиваемых высокопроизводительных приложений. Путем интегрирования процессоров, системных интерфейсов и встроенной памяти с коррекцией ошибок, CSX700 представляет собой достаточно экономичное, надежное и производительное решение, отвечающее требованиям современных приложений [6-8]. Концентрация сил должна рассматриваться как норма, а их рассредоточение – как исключение, требующее доказательств. Клаузевиц 1 Распространение БПЛА, https://www.sandia.gov/RADAR/video/index.html https://www.eetimes.com/radar-basics-part-4-space-time-adaptive-processing/ РЛС с синтезированной апертурой http://www.npomash.ru/activities/ru/space1.htm , Головки наведения ракет - дискриминация целей http://siagat.ru/ требует создание специализированного процессора 2. критерии оценки 2D FFT 1024x024 or 2048x2048 per sec отношение Gflops per power consumed ,радиационная стойкость ,гибкость имплементации , стоимость и срок разработки 3. снижение проектных норм ,погоня за стандартами потребительских процессоров ошибочно , 0.09 micron дает высокий результат для космических применений http://www.ann.ece.ufl.edu/courses/eel6686_15spr/slides/Presentation1_Morales_Onishi.pdf http://people.cs.bris.ac.uk/~simonm/publications/ClearSpeed_HPEC08.pdf

milstar: 16 nanometer FinFET process from Taiwan Semiconductor Manufacturing Corp, The Aurora chip is 33 millimeters by 15 millimeters, for a die size of 494 square millimeters, something between 275 watts and 295 watts. memory bandwidth, at 1.2 TB/sec The Aurora chip has 16,384 running at 1.6 GHz, which yields 307.3 gigaflops per core and 2.45 teraflops per socket. https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/ The package itself is very big at 60 mm x 60 mm. The VE processor die itself is 15 mm x 33 mm with a very large interposer with a total Si area of 1,235 mm² (32.5 mm x 38 mm). https://en.wikichip.org/wiki/nec/microarchitectures/sx-aurora 16 nm process 4,800,000,000 transistors 14.96 mm x 33.00 mm 493.68 mm² die size

milstar: В февральском номере журнала «Вопросы радиоэлектроники» руководители компаний МЦСТ и ИНЭУМ, которые разрабатывают процессоры «Эльбрус», поделились подробностями о передовой разработке российских инженеров — микропроцессоре «Эльбрус-16СВ». Согласно статье, «Эльбрус-16СВ» в данный момент находится на этапе создания. Чипсет станет шестым поколением архитектуры «Эльбрус», он будет изготавливаться по 16-нм технологии. Приблизительные сроки запуска его в производство — 2021 год. Характеристики микропроцессора «Эльбрус 16СВ»: Производительность: до 1500 Гфлопс; Количество ядер: 16; Тактовая частота: 2 ГГц; Объём неинклюзивной кэш-памяти (L2+L3): 40 МБ; Оперативная память: DDR4, четыре канала (до 102 ГБ/с); Интерфейсы: PCIe 3.0, 1/10 Gb Ethernet, SATA 3.0, USB3.0 и прочие; Потребляемая мощность: 100 Вт; Количество транзисторов: 6 млрд; Площадь кристалла: 400 мм2; Другое: возможность объединения до четырёх микро-процессоров на общей памяти.

milstar: В состав СБИС К1879ВМ8Я входят: 32-х разрядный универсальный управляющий RISC процессор ARM Cortex-A5 4 (четыре) кластера, каждый из которых содержит RISC процессор ARM Cortex-A5 и четыре процессорных ядра NMC4 5 (пять) интерфейсов с внешней памятью типа DDR3, DDR3L Интерфейсы: PCIe2.0, SPI, Ethernet IEEE Std 802.3-2012, GPIO, JTAG Высокоскоростные интерфейсы для межпроцессорного обмена Основные характеристики Технология изготовления – 28 нм КМОП Корпус– 1444 HFCBGA Рабочая частота ARM-ядра - 800МГц Рабочая частота ядер NMC4 - 1 ГГц Производительность FP32 – 512 GFLOPs Производительность FP64 – 128 GFLOPs Типовая потребляемая мощность - 12 Вт Максимальная потребляемая мощность - 35 Вт Внутренняя память объемом – 83,5 Мбит Суммарная пропускная способность интерфейсов межпроцессорного обмена - 16 Гбит/с Пропускная способность интерфейсов с внешней памятью - 256 Гбит/с Условия эксплуатации: -40°C …+65°C Параметры управляющего RISC процессора ARM Cortex-A5 Тактовая частота – 800 МГц; ISA – ARMv7; Разрядность адреса – 32 бита; Кэш память 1 уровня: 32КБ – команды, 32 КБ – данные; Кэш память 2 уровня: 512 КБ. Параметры процессорных ядер NMC4 Тактовая частота – 1000 МГц; ISA – NMC4; Размер адресного пространства – 4Гх32 бит; Обработка 32-х и 64-х разрядных данных в формате плавающей точки; Производительность (одинарная точность) - 32 GFLOPs; Производительность (двойная точность) - 8 GFLOPs; 4 вычислительных ядра; Блок переупаковки данных, выполняющий преобразование данных целочисленного формата в формат плавающей точки с одинарной и двойной точностью и обратно. Области применения Видео обработка Гидро- и радиолокация 3D машинное зрение Нейросети Гетерогенные вычислительные системы https://www.module.ru/directions/aviacia/82-18798

milstar: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01197-radar-fpga-or-gpu.pdf

milstar: The main limiting factor is multiplier resources. This is excellent news for the designer, because it is much more feasible to use 100% of DSP block resources than 100% of logic resources. As the logic usage approaches 100%, timing closure can get more difficult. Also, additional logic is needed to get data in and out of the FPGA to build memory buffer interfaces and other functions. https://www.techdesignforums.com/practice/technique/achieving-teraflops-performance-with-28nm-fpgas/

milstar: Xilinx, Inc., the leader in adaptive and intelligent computing, today announced the expansion of its 16 nanometer (nm) Virtex UltraScale+ family to now include the world's largest FPGA — the Virtex UltraScale+ VU19P. With 35 billion transistors, the VU19P provides the highest logic density and I/O count on a single device ever built, enabling emulation and prototyping of tomorrow's most advanced ASIC and SoC technologies, as well as test, measurement, compute, networking, aerospace and defense-related applications. The VU19P sets a new standard in FPGAs, featuring 9 million system logic cells, up to 1.5 terabits per-second of DDR4 memory bandwidth and up to 4.5 terabits per-second of transceiver bandwidth, and over 2,000 user I/Os. It enables the prototyping and emulation of today's most complex SoCs as well as the development of emerging, complex algorithms such as those used for artificial intelligence, machine learning, video processing and sensor fusion. The VU19P is 1.6X larger than its predecessor and what was previously the industry's largest FPGA — the 20 nm Virtex UltraScale 440 FPGA.

milstar: https://www.intel.com/content/www/us/en/products/programmable/fpga/stratix-10.html Intel® Stratix® 10 GX FPGAs are designed to meet the high-performance demands of high-throughput systems with up to 10 TFLOPS of floating-point performance and transceiver support up to 28.3 Gbps for chip-module, chip-to-chip, and backplane applications. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pt/stratix-10-product-table.pdf Highest performance per watt Up to 10 TFLOPS single-precision floating-point performance. Up to 80 GFLOPS/Watt. Cover fMAX up to 1 GHz enabling high throughput beam processing.

milstar: https://www.microsemi.com/product-directory/fpga-soc/1640-rad-tolerant-fpgas Rad-Tolerant FPGAs

milstar: The FFTC resides on a number of TI multicore system-on-chips (SoCs), including 66AK2L06, TCI6636K2H and TCI6638K2K SoCs, which contain both ARM® Cortex®-A15 cores and C66x digital signal processor (DSP) cores. To put the FFTC performance in perspective, the TCI6638K2K can implement over 4.5 million, 1024-point FFT operations per second using six FFTCs, compared to an equivalent ~300,000 FFTs per second using six C66x DSP cores. https://www.ti.com/lit/wp/spry294/spry294.pdf

milstar: http://www.ann.ece.ufl.edu/courses/eel6686_15spr/papers/RADSPEED.pdf

milstar: https://www.baesystems.com/en-us/our-company/inc-businesses/electronic-systems/product-sites/space-products-and-processing/radiation-hardened-electronics

milstar: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9158-multi-gpu-fft-performance-on-different-hardware-configurations.pdf

milstar: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/dgx-2h-datasheet-us-nvidia-841283-r5.pdf

milstar: https://vk.com/doc503118849_575915834?hash=da97ca2ae340127304 https://vk.com/doc503118849_575915996?hash=0a80c936dd4aff71a1

milstar: https://baikalelectronics.ru/about/press-center/news/352/ Существует ли опасность, что Соединенные Штаты могут эту лавочку вам прикрыть, поскольку даже на тайваньской фабрике используются технологии, придуманные в США? - Теоретически такой риск есть, если я вам скажу, что его нет, я вас обману. Насколько он серьезен, насколько он реален и как он может реализоваться, оценить сложно. Мы все знаем пример компании Huawei, которая вроде как под американскими санкциями, вроде как там всё плохо, а Huawei производит свои чипы по самому последнему техпроцессу семь нанометров у TSMC и никаких проблем не испытывает. От таких примеров мы можем отталкиваться при оценке наших рисков.

milstar: A video standard of 1024 × 768 provides 1024 picture elements (pixels) of horizontal resolution. If a 512-point Fourier transform is performed, the 256 points generated by the transform fit nicely on a screen 1024 pixels wide. The same is true of a 1024-point transform, where a 1024 pixel wide screen is more than adequate to contain the 512 points generated by the transform. The problem arises when a transform larger than 2048 points is performed. Say an 8192-point Fourier transform is performed. The 4096 points generated by the transform is much wider than the 1024 pixel width of the screen. In order to get the entire power spectrum on one screen width, a compression factor (in this case, a factor of 4) must be applied. Magnification must then be applied to examine the spectrum at the full resolution of the 8192-point transform. Additionally, when a magnification factor is applied that prohibits the display of the entire power spectrum on a single screen width, the software should allow you to pan the entire plot one screen width at a time. https://www.dataq.com/data-acquisition/general-education-tutorials/fft-fast-fourier-transform-waveform-analysis.html

milstar: FPGA vendor Xilinx later standardized on the 18×25 size multiplier with the DSP48E architecture (so named for its 48 bit accumulator). This was to better support FFTs, where bit growth occurs in the data path as described above. The twiddle factors, or coefficients remain at 18 bits, while the 18 bit data can grow up to 7 more bits, to 25 bits. FPGA vendor Altera countered with an 18×36 multiplier mode. The latest innovation occurred with Altera’s 28nm Stratix V FPGAs, offering a new Variable Precision DSP architecture. This architecture has native support for 18×18, 18×36 and 27×27 precision multipliers, all featuring 64 bit accumulators. It supports higher precision fixed point processing for FIR filters, correlators, accumulators and FFTs, and also includes a highly efficient 18×25 complex multiplication mode specifically designed for fixed point FFTs. The 18×25 complex mode can be with only three multipliers rather than the normally required four multipliers, by leveraging built in pre-adders and post adders. Another recent FPGA innovation is high-performance floating-point support that enables the FPGA parallel hardware architecture advantages to be used in applications the dynamic range of floating-point is required such as radar processing. This is now available from Altera, using the Variable Precision DSP hardware architecture and new floating-point tool flow known as “Fused Datapath”. https://www.eetimes.com/radar-basics-part-3-beamforming-and-radar-digital-processing/

milstar: As will be shown, STAP requires the processing capability to invert matrices containing of 100,000 or more elements in well under a millisecond. https://www.eetimes.com/radar-basics-part-3-beamforming-and-radar-digital-processing/

полная версия страницы