Boewie wichislitel'nie kompleksi (продолжение)

Форум » Дискуссии » Boewie wichislitel'nie kompleksi (продолжение) » Ответить

Boewie wichislitel'nie kompleksi (продолжение)

milstar: http://drops.dagstuhl.de/opus/volltexte/2006/732/pdf/06141.AthanasPeter.Paper.732.pdf Although an FPGA’s clock rate rarely exceeds one-tenth that of a PC, hardware implemented digital filters can process data at ###################################################################### many times that of software implementations [4] ################################### . Additional performance gains have been described for cryptography [5], network packet filtering [6], target recognition [7] and pattern matching [8], among other ########################################################################## applications. A. Present Day Cost-Performance Comparison Owing to the prevalence of IEEE standard floating-point in a wide range of applications, several researchers have designed IEEE 754 compliant floating-point accelerator cores constructed out of the Xilinx Virtex-II Pro FPGA’s configurable logic and dedicated integer multipliers [16-18]. Dou et al published one of the highest performance benchmarks of 15.6 GFLOPS by placing 39 floating-point processing elements on a theoretical Xilinx XC2VP125 FPGA [19]. Interpolating their results for the largest production Xilinx Virtex-II Pro device, the XC2VP100, produces 12.4 GFLOPS, compared to the peak 6.4 GFLOPS achievable for a 3.2 GHz Intel Pentium processor. Assuming that the Pentium can sustain 50% of its peak, the FPGA outperforms the processor by a factor of four for matrix multiplication. One of the earlier projects demonstrated a 23x speedup on a 2-D FFT through the use of a custom 18-bit floating-point format [26]. More recent work has focused on parameterizible libraries of floating-point units that can be tailored to the task at hand [27-29]. By using a custom floating-point format sized to match the width’s of the FPGA’s internal integer multipliers, a speedup of 44 was achieved for a hydrodynamics simulation [30] using four large FPGAs. Nakasato and Hamada’s 38 GFLOPS of performance is impressive, even from a cost-performance standpoint. For the cost of their PROGRAPE-3 board, estimated at $15,000, it is likely that a 15-node processor cluster could be constructed producing 196 single precision peak GFLOPS. Even in the unlikely scenario that this cluster could sustain the same 10% of peak performance obtained by Nakasato and Hamada’s for their software implementation, the PROGRAPE-3 design would still achieve a 2x speedup. As in many FPGA to CPU comparisons, it is likely that the analysis unfairly favors the FPGA solution. Hardware implementations require specialized skills in digital design and vendor-specific tool flows. Development time and costs are significantly higher than for software. Many comparisons in literature spend significantly more time optimizing the hardware implementations than they do optimizing their software implementations. Previous research has demonstrated significant compiler inefficiency for common HPCfunctions [31]. For the DGEMM matrix multiplication function, a hand-coded version outperformed the ############################################### compiler by greater than eight times. ############################ A to- tal of 39 PEs can be integrated into the xc2vp125-7 FPGA, reaching performance of, e.g., 15.6 GFLOPS with 1600 KB local memory and 400 MB/s external memory bandwidth 1 is s 1700 nozkami i wisokoj stoimost'ju porjadka 8000 $ segodnja http://ce.et.tudelft.nl/~george/publications/Conf/FPGA05/FPGA05Dou.pd http://www.xilinx.com/publications/matrix/virtexmatrix.pd Xilinx Vertex FPGA

Ответов - 165, стр: 1 2 3 4 5 6 7 8 9 All

milstar: Again, the values of Xk represent the amount of signal energy at each frequency point, equally spaced across the sampled frequency spectrum. Since is a complex number, it provides both the magnitude and phase of each frequency component. These points in the frequency spectrum are often referred to as frequency bins. As N becomes larger, the spectrum is divided into more bins, with closer frequency spacing, providing for finer frequency discrimination. https://www.eetimes.com/radar-basics-part-3-beamforming-and-radar-digital-processing/ Normally, FFTs are used to take a time domain signal and separate it into its different frequency components. In this case, the FFT will separate the incoming signal into its different spatial components or angle of arrival components. The input signals are sorted by the FFT into bins corresponding to different angles of arrival, as shown in Figure 3. Similarly, in the transmit direction, a signal fed into each FFT bin input will be transmitted in a specific direction, corresponding to a specific antenna lobe. If the input to a FFT bin is zero, no energy will be transmitted in that direction; the transmit lobe will be “missing”. The FFT method of beamforming is computationally very efficient and allows for multiple directional signals to be simultaneously received and transmitted. This can be a very useful capability in multi-mode radar, which must track multiple targets simultaneously. However, the spacing and direction of the N antenna beams are fixed and equally spaced in direction, ranging over 180 degrees from the antenna array. In keeping with the characteristics of the FFT, the peak of any given antenna beam lies exactly on the null of the sidelobes of all the neighboring antenna beams. This characteristic is known as orthongonality. This processing requires that all the data be present in the array before any Doppler processing can be performed. The amount of data can be quite large and for high performance radar processing, needs to be accessed with very low latency. This either requires very high on-chip memory resources or a very low latency, fast random access external memory array coupled with a high performance memory access controller. Since the data comes in columns and is read in rows, the read and write accesses cannot both be sequential, making it difficult to meet the low latency requirements with traditional caches and DDR memory chips. . There can be hundreds or even thousands of separate receive/transmit units in an AESA antenna. The antenna may be tracking targets in multiple directions, requiring separate processing for each. The processing must be performed over two dimensions, both time (pulse compression) and frequency (Doppler). Furthermore, in installment #4 of this Radar Basics series, how an additional processing dimension, spatial, can be added in the space-time adaptive processing (STAP), which will cause a further dramatic increase in digital signal processing requirements.

milstar: Radar systems are very challenging to design, partly because of the dynamic range of the signals involved. Referring back to the radar range equation, the signal level at the receiver is proportional to the fourth power of the distance to target. The sensitivity levels required by a radar receiver are far more demanding than any wireless communications system. Simultaneously, the radar receiver must cope with potentially very high receive signal levels due to clutter, jamming, interference, close range targets, or even from the transmitter itself. This requires high, numerical fidelity digital signal processing techniques. In order to achieve proper system performance with potentially very low signal levels and high level of interference, the quantization noise levels introduced during digital processing must be well below the receiver noise floor.

milstar: Most processors and digital signal processors (DSPs) operating with 16 bit word lengths are not sufficient for many aspects of radar processing. Another option is to use floating point processors. With single precision floating point, a 24 bit mantissa (including sign bit) provides 144 dB. And the floating point exponent (8 bits) allows this 144 dB range to automatically adjust or “float” to the signal level at each operation, providing tremendous dynamic range. However, the floating point processors often found in radar systems, such as Analog Device’s Tigersharc or Freescale’s PowerPC, have limited processing capability. ################################################ Newer processor architectures offer higher levels of floating point processing capability, primarily though the use of many cores. The trade-off is a more difficult development environment, requiring complex data flow management and that the data dependencies be eliminated between various functions in order to be partitioned across multiple processors without stalling. Power consumption can also be a challenge in these architecture https://www.eetimes.com/radar-basics-part-3-beamforming-and-radar-digital-processing/

milstar: As will be shown, STAP requires the processing capability to invert matrices containing of 100,000 or more elements in well under a millisecond.

milstar: http://www.themobilestudio.net/the-fourier-transform-part-13 The Fast Fourier Transform – Numerical Example What frequencies make up the following signal? This signal has 16 samples in it so we are going to run an 16-point FFT to find out the answer. The 16 samples in the signal have the following values: x(0) ...x(16) 1. Firstly we divide and reorder our samples into groups of 2 using the bit reversal method, so sample x(0) gets grouped with sample x(8). Sample x(4) gets grouped with sample x(12) et 2.Now we perform a 2-point DFT on each of the sample pairs. This is very easy as we simply add the samples together for the first term then subtract the for the second so: [\large a_0=x_0+x_8=0.5+0.5=1] [\large a_1=x_0-x_8=0.5-0.5=0] [\large a_2=x_4+x_{12}=-0.25-0.75=-1] [\large a_3=x_4-x_{12}=-0.25+0.75=0.5] …and so on for each of the eight 2-point DFTs. Eight 2-Point DFTs So the results for all the 2-point DFTs are as follows: a(0),a(1) ... a(14),a(15) 3.The next stage is to start combining the results of the previous stage into larger and larger DFTs until we arrive back at a 16-point DFT. So the next stage is to combine the eight 2-point DFTs into four 4-point DFTs. We use the output of the previous stage to form the input of the next stage so we can treat it like a 2-point DFT. It is now the twiddle factors begin to come into play so let’s remind ourselves of their values for a 4-point DFT: [ Notice that for the first time, in the 4-point DFT we have an imaginary term so there is going to be a Sine component to some of the results as well as a Cosine component. This makes the multiplication by the twiddle factor a little more “complex” as the twiddle factor is a complex number. Before we begin with the numeric calculation, I want to take a quick look at multiplication with complex numbers. If you already know about multiplication with complex numbers, then click here to skip to the rest of the example. Multiplication with Complex Numbers A complex number can be one of three types of numbers: A completely real number (a number with the imaginary part equal to zero) A completely imaginary number (a number with the real part equal to zero) A number with both an real and imaginary component (a number with a non-zero real part and a non-zero imaginary part) Therefore there can be four types of multiplication: A real number multiplied by another real number A real number multiplied by an imaginary number An imaginary number multiplied by another imaginary number A complex number multiplied by another complex number http://www.themobilestudio.net/the-fourier-transform-part-13 So the calculations for the 4-point DFTs will work as follows:

milstar: https://lr.ttu.ee/irm/sideseadmete_mudeldamine/5.pdf АлгоритмБПФпооснованию 2 разделяетполноевычислениеДПФнакомбинацию 2-точечныхДПФ. Каждое 2-точечноеДПФсодержитбазовуюоперациюумноженияснакоплением, называемую «бабочкой» ииллюстрируемуюнарис.5.13. Надиаграммепоказаныдвапредставления «бабочки»: верхняядиаграммафактическиявляетсяфункциональнымпредставлением «бабочки», построеннымнацифровыхумножителяхисумматорах. Вупрощеннойнижнейдиаграммеоперацииумноженияпомечаютсямножителемвозлестрелки, аподсуммированиемподразумеваютсядвестрелки, сходящиесявточке. 8-точечноеБПФспрореживаниемвовремени (decimation-in-time, DIT) вычисляетокончательныйрезультатсиспользованиемтрехкаскадов, какэтоследуетизрис.5.14. Восемьвходныхотсчетовизвременнойобластисначаларазделяются (илипрореживаются) начетырегруппы 2-точечныхДПФ. Затемчетыре 2-точечныхДПФобъединяютсявдва 4-точечныхДПФ. Затемдва 4-точечныхДПФобъединяютсядлятого, чтобыполучитьокончательныйрезультат X(k). Подробнопроцессрассмотреннарис.5.15, гдепоказанывсеоперацииумноженияисуммирования. Нетруднозаметить, чтобазоваяоперация «бабочки» 2-точечногоДПФформируетосновудлявсеговычисления. Вычислениеосуществляетсявтрехкаскадах. Послетого, какзаканчиваетсявычислениепервогокаскада, нетнеобходимостисохранятькакие-либопредыдущиерезультаты. Результатывычисленияпервогокаскадамогутбытьсохраненывтехжесамыхрегистрахилиячейкахпамяти, которыепервоначальнохранилиисходныеотсчетыизвременнойобласти x(n). Точнотакже, когдазаканчиваетсявычислениевторогокаскада, результатывычисленияпервогокаскадамогутбытьудалены. Такимжеобразомосуществляетсявычислениепоследнегокаскада, заменяявпамятипромежуточныйрезультатвычисленияпредыдущегокаскада. Обратитевнимание, чтодлятого, чтобыалгоритмработалдолжнымобразом, входныеотсчетыповремени x(n) должныбытьупорядоченыопределеннымобразомсиспользованиемалгоритмареверсированиябитов.

milstar: BAE rad hard ASIC 0.045 micron https://www.baesystems.com/en-us/product/radiation-hardened-application-specific-integrated-circuits--asics- Today, BAE Systems has over 1,000 computers on more than 300 satellites, and our space computers have logged more than 10,000 years of flight time without a failure

milstar: 1 Jul 2020 BAE Systems’ software defined radio is anchored by the RAD5545 single board computer (SBC), providing the most advanced radiation-hardened quad core general purpose processing solution available today to address future threats on a variety of missions. The system leverages modular and standard building blocks including a SpaceVPX chassis and backplane electrical connectors, Serial RapidIO® and Spacewire interfaces, and a fully supported expansion port for a custom interface card. https://www.baesystems.com/en/article/bae-systems-delivers-first-radiation-hardened-rad5545-radios

milstar: http://flightsoftware.jhuapl.edu/files/2016/Day-2/Day-2-13-Saridakis.pdf

milstar: https://www.moog.com/content/dam/moog/literature/Space_Defense/spaceliterature/avionics/moog-multi-core-dsp-processor-datasheet.pdf RAD TOLERANT, 150 GFLOP DSP SpaceVPX SINGLE BOARD COMPUTER

milstar: Intel Core Duo at 3.0 GHz does it in 96.8 µs 8192 point FFT approx 10300 1D FFT 8192 points in sec 65 nm 65 nm 45 nm OFDM is extensively used in wireless LAN and MAN applications, including IEEE 802.11a/g/n and WiMAX. IEEE 802.11a/g/n, operating in the 2.4 and 5 GHz bands, specifies per-stream airside data rates ranging from 6 to 54 Mbit/s. If both devices can use "HT mode" (added with 802.11n), the top 20 MHz per-stream rate is increased to 72.2 Mbit/s, with the option of data rates between 13.5 and 150 Mbit/s using a 40 MHz channel. Four different modulation schemes are used: BPSK, QPSK, 16-QAM, and 64-QAM, along with a set of error correcting rates (1/2–5/6). The multitude of choices allows the system to adapt the optimum data rate for the current signal conditions. ########### A High-Performance, Low-Power Linear Algebra Core 45 nm https://www.cs.utexas.edu/~flame/pubs/ASAP11.pdf ############# 0.09 micron BAE space rad hardened http://www.ann.ece.ufl.edu/courses/eel6686_15spr/papers/RADSPEED.pdf 2015 BackgroundRADSPEEDArchitecture ComparisonsOptimizationsResultsConclusionsClearspeed Embedded Ap http://people.cs.bris.ac.uk/~simonm/publications/ClearSpeed_HPEC08.pdf clearspeed csx700 0.09 micron 172792 1D 2048 FFT per sec or 43198 8192 1D FFT per sec ############

milstar: For example, an FFT Length of 512 will set Standard to 802.11ax 40 MHz. ---------------- To achieve higher data rate requirement in the order of 10 Gbps, 5G technology has been developed. The specifications are published in the 3GPP Release 15 and beyond. 5G has different frequency ranges sub 6 GHz (5G macro optimized), 3-30 GHz (5G E small cells) and 30-100 GHz (5G Ultra Dense). 4096 FFT Point 1024 FFT Point (70 GHz) 2048 FFT Point (3 to 40GHz) https://www.rfwireless-world.com/Tutorials/5G-millimeter-wave-tutorial.html he LTE FFT LogiCORE™ IP provides support for all transform point sizes defined by the 3GPP-LTE specifications, including the 1536pt transform required for 15MHz bandwidth support, enabling resource optimized eNB implementations across all potential eNB configurations. A graphical user interface allows the generation of netlists tailored to the needs of each application. The LTE FFT LogiCORE IP is a component of Xilinx LTE Baseband Targeted Design Platform. Key Features and Benefits Support for all point sizes defined by 3GPP-LTE standards: 128, 256, 512, 1024, 1536, 2048 pts 1536 pts transform supports 15MHz bandwidth systems https://www.xilinx.com/products/intellectual-property/ef-di-lte-fft.html

milstar: . Twiddle factor multiplierFig. 5 shows the twiddle factor multiplier (multi) modulewhere the inputs are multiplied with TFs. The TFs are storedin ROMs whose depth depends on the number of FFT points.Considering the symmetric property of trigonometric func-tions, we can reduce the number of TF values to be storedin the ROMs toN/4TF values. Since it is necessary to readthe corresponding TF values, we use counters http://www.apsipa.org/proceedings/2020/pdfs/0000114.pdf

milstar: The throughput canbe estimated as 4 sample/cycle×123 MHz = 492 M sample/secregardless of the number of FFT points, which is a significantlyhigh-throughput. Considering the FFT implementation for 5GOFDM which requires 4,096-point FFT, we believe that thisthroughput is reasonable. http://www.apsipa.org/proceedings/2020/pdfs/0000114.pdf

milstar: Assuming an FFT-based transmitter/receiver implementation, 15 kHz subcarrier spacing corresponds to a sampling rate fs = 15000 NFFT, where NFFT is the FFT size. Nevertheless, FFT-based implementations of OFDM are common practice and an FFT size of 2048, with a corresponding sampling rate of 30.72 MHz, is suitable for the wider LTE carrier bandwidths, such as bandwidths of the order of 15 MHz and above. https://www.sciencedirect.com/topics/engineering/downlink-and-uplink-transmission

milstar: What’s Up With Digital Downconverters—Part 1 https://www.analog.com/en/analog-dialogue/articles/whats-up-with-digital-downconverters-part-1.html#

milstar: Digital Down-Conversion As shown in the figure, the blocks after the ADC are all operating in the digital domain. For example, the outputs of Oscillator 2 in Figure 2 are actually the digital values corresponding to the sine and cosine signals. The second down-conversion is performed using two digital multipliers, and the LPFs are digital filters. https://www.allaboutcircuits.com/technical-articles/dsp-basics-of-digital-down-conversion-digital-signal-processing/

milstar: https://eri-summit.darpa.mil/docs/ERISUMMIT2020/Presentations/2020%20T_AM%20ERI%20DoD%20Unique%20Needs%20v5%20(Distro%20A).pdf

milstar: Consider a radio signal lying in the range 39-40MHz. The signal bandwidth is 1MHz. However, it is often digitized with a sampling rate over 100Msamples per Second, representing in the region of 200Mbyte/second. The DDC allows us to select the 39-40MHz band, and to shift its frequency down to baseband and in doing so reduce the sample rate, with a 1MHz bandwidth, a sample rate of 2.5MHz would be fine - giving a data rate of only 5Mbyte/second. This is shown in Figure 1. http://hunteng.co.uk/pdfs/tech/ddctheory.pdf V1.2 06032 How It Works A Digital Down Converter is basically complex mixer, shifting the frequency band of interest to baseband. Consider the spectrum of the original continuous analogue signal prior to digitisation, as shown in Figure 2, because it is a real signal it has both positive and negative frequency components. If this signal is sampled by a single A/D converter at a rate that is greater than twice the highest frequency the resulting spectrum is as shown in Figure 3. The continuous analogue spectrum repeated around all of the sample frequency spectral lines. The first stage of the DDC is to mix, or multiply, this digitised stream of samples with a digitised cosine for the phase channel and a digitised sine for the quadrature channel and so generating the sum and difference frequency components

milstar: http://web.mit.edu/6.02/www/f2006/handouts/Lec9.pdf Coherent Detection Requires receiver local oscillator to be accurately aligned in phase and frequency to carrier sine wave

полная версия страницы