Xilinx

Xilinx intros Alveo U55C data center accelerator

Posted on Updated on

Xilinx Inc. has introduced the Alveo U55C data center accelerator today.

Neeraj Varma, Director, APAC and Japan Sales, Global Datacenter Business, Xilinx, said they are the inventors of FPGA. India operations were started in 2005. Xilinx has since grown significantly, with nearly 1,000 people based in India. It is currently no. 1 FPGA/SoC provider for aerospace and defense, no. 1 in FPGA-as-a-service on Amazon Cloud, no. 1 logic IC vendor in T&M, and has world’s first commercial new radio deployment in 5G.

Neeraj Varma.

Xilinx has adaptable hardware architectures. It introduced RF SoCs in 2017, and moved to adaptive compute accelerator platforms (ACAP) in 2018. It has had a track record of innovation over the years in hardware and software. The platforms are accessible to all developers. There are a range of deployment methods for customers, such as hardware adaptable devices, deployable end-systems, and FPGA-as-a-service (FaaS).

Xilinx is seeing evolution of the data center. There are rapidly evolving workloads and algorithms. Compute is moving closer to data with storage and network controllers. There is the adaptable acceleration for the data center. It has comprehensive software and hardware stack for all developers. There is a rapidly growing ecosystem.

Nathan Chang, HPC Product Manager, Data Center Group, Xilinx, said that as HPC pushes toward the exascale threshold, power consumption will be the next barrier. Typical HPC architectures will be hard pressed to deliver acceptable performance/watt. There are limitations with CPU and GPU Von Neumann architectures. Data movement challenges cause performance degradation. Data must be prepared in transit between functions to maximize performance. Rigid memory hierarchies create inefficiencies. The net result: wasted clock cycles, less work, more power consumption.

Breakthrough HPC architecture
Today, Xilinx is announcing a breakthrough HPC architecture. It is the most capable Alveo HPC accelerator card ever. It has a groundbreaking HPC clustering solution that enables massive scale-out across existing customer infrastructure and network. Full high-level programmability of both application and cluster is available. There is scale-out architecture on RoCE v2 and DCBx with existing data center server infrastructure. There is shared workload and shared memory across multiple cards. MPI enables hyper-parallelism of Xilinx adaptive compute across nodes.

Vitis unified software platform has domain-specific development environment, Vitis accelerated libraries and APIs, and Vitis core development kit. It enables the data scientists to create recommendation engines with lowtouch/no-touch coding. APIs for FEM developers help leverage custom data movement, with no hierarchical memory lock-in, and no cache misses. It is enabling scale out across converged Ethernet.

The Alveo U55C is purpose built for HPC and Big Data workloads. Many HPC workloads are either compute or memory bandwidth bound. I/O requirements expand exponentially over time. Power consumption is a huge issue in the data center. HPC needs gravity of compute and high-bandwidth memory. In response, Xilinx built the most powerful accelerator ever, and made sure it scaled easily.

There is more parallelism of data pipelines, superior memory management, optimized data movement throughout the pipeline, and best performance-per-watt. Xilinx adaptive computing for HPC involves clustering at massive scale, built for software developers, and has powerful accelerators.

Nathan Chang.

HPC signal processing is via CSIRO, in Australia, the world’s largest radio astronomy antenna array. It is built to catalog the origins of the universe, requiring terabits/s of sensor data to be processed in real time. The solution: distributed processing across hundreds of Xilinx Alveo accelerators in real-time.

CSIRO is completing reference design to help other organizations achieve the same success. HPC also involves CAE Ansys LS-DYNA. LS-DYNA is the finite element program (FEM). It uses FEM to simulate real-world product performance. LS-DYNA allows designers and engineers the ability to create simulations with an infinite amount of complexity.

Large scale simulations take weeks on a CPU. The x86 architectures aren’t equipped to provide the high I/O and bandwidth required. CPU memory hierarchies are inflexible and that creates unnecessary overhead. The x86 architectures are inherently inefficient at handling data movement.

There is U55C hyperparallel data pipelining for LS-DYNA. Data is pipelined to simply stream between functions. Data is prepared in transit to achieve the maximum throughput. There are highly composable memory hierarchies — 16GB HBM2 memory, 32 HBM channels @ 460GB/s, etc. Workload is partitioned across multiple Alveo U55C cards. The result: 5x performance vs. CPU. U55C real-time graph insights indicated that real-time results demanded Xilinx acceleration. Tuned algorithms are required for important real-time graph use cases. The Xilinx U55C is available now.

The new Xilinx HPC clustering solution enables massive scale-out across existing customer infrastructure and network. The Xilinx Alveo U55C accelerator card, now shipping, brings superior performance per watt to HPC and database workloads, and easily scales through Xilinx clustering. Software developers and data scientists can unlock the benefits of Xilinx adaptive computing through high-level programmability of application and cluster.

Adaptive computing, with innovation accelerated @ Xilinx

Posted on

Xilinx Adapt 2021 was held recently. Ivo Bolsens, Senior VP and CTO, Xilinx, presented on adaptive computing, with innovation accelerated, and shared the vision for an adaptable, intelligent world.

Ivo Bolsens.

Adaptive computing is proliferating today. It is driving life-changing innovations across many areas. Hardware is adapted and optimized to the app. We embrace the concept of domain-specific architecture, and have an app acceleration. There is the growth of compute requirements. It has been doubling every 3.4 months since 2012. There is growth of memory requirements. New apps fuel model size growth to trillions of parameters. Larger models can solve more challenging tasks.

There is growth in communications requirements. There are high-level targets for 6G, such as 100Gbps-1Tbps peak data rates, 0.1ms radio latency, connected density of 100 devices per cubic meter, 100-fold increase in traffic, and 10 times more energy efficient.

Xilinx has evolved to a platform company. It has evolved to an adaptive compute acceleration platform (ACAP). It has now moved to peer processing. Onramp for all developers is now available. Eg., Vivado ML is helping increase levels of abstraction, and 10 percent average QoR gain. There is 5X average hierarchical compile time reduction. There have been 23,000 Xilinx Vitis AI downloads, and 76,000 Vitis downloads.

Computing has evolved to be deterministic, real-time, distributed, etc. Platforms are the key, covering data center, edge, and embedded. You can accelerate storage and networking. Data at rest looks at the SmartSSDs, while data in motion looks at the SmartNICs. FPGA-equipped devices can solve data-centric challenges. Xilinx is enabling the composable data center, with an adaptable intelligent fabric. Eg., today, Microsoft has deployed Xilinx across its cloud.

Xilinx AI engine is at the forefront of the adaptive revolution. The AI engine array has helped extend both multi-threaded and data flow architecture. It represents the next generation of adaptive technology. He also gave an example of Samsung, with whom it is in partnership. The AI engine has domain-specific variants. Xilinx has also worked with Pacific National Northwest Lab. James Ang, PNNL, gave a talk about E3SM or Energy Exascale Earth System Model.

Xilinx is now accelerating the whole app. In automotive, there is FPGA fabric for sensor fusion and pre-processing, AI engines for signal conditioning and low latency AI, and scalar engine for decision making and vehicle control. Xilinx is increasing access at the edge and endpoint. An example is the Kria SOM launch momentum. The demand was overwhelming.

Developer accessibility and productivity is essential. Software development platform for heterogenous systems is critical for broad adoption. Xilinx has announced extensions for Vitis and Vivado. We are driving our innovation to accelerate your innovation. These are being done across multiple areas, such as robotics surgery, gene sequencing, Mars Rover, etc.

Xilinx on future technologies of 5G wireless

Posted on

Day 2 of the Xilinx Adapt: 5G event began with a presentation by Brendan Farley, VP Wireless Engineering & MD EMEA, Xilinx. He spoke about the future technologies of 5G wireless.

There is the disaggregated 5G radio access network, by 3GPP and O-RAN. If we look at the new radio units, mmWave technology is very sensitive to the channel. It needs to be easy installation, with low costs from the operational side. The hardware needs to be cost effective. On the DU side, operators are tied into OEMs. The objective is to move to a server-controlled open approach. The partitioning between the DU and radio unit is key.

Brendan Farley.

The view from the base station gives one of massive MIMO. Viewing the city scape, there is direct line-of-sight access available in most areas. We get the capacity increase by re-using the frequency. Beam forming gets visualized. Services such as video streaming, AR/VR. remote health consultation, etc. are being offered.

There is need to upgrade the existing 4G sites with 5G mMIMO panels. You need to minimize the hardware TCO. This can be done by virtualizing the baseband and centralize to support multiple radios. Advanced silicon technology and integration will be needed. You also minimize the opex. There is need to maximize the RAN performance and capacity. You can also increase the bandwidth, optimize baseband with radio partitioning, and optimize the 5G operator services through acceleration, virtualization, and O-RAN.

In terms of the radio panel power, weight and cost optimization, there is the 320W 64TRX mMIMO panel. It can host increasingly powerful DPD algorithms to linearize the power efficient GaN PAs. GaN has been used successfully in China. You can also reduce the panel heatsink weight through RF power reduction. You can also drive lower TCO by utilizing the most advanced silicon technology with flexibility to enable future 5G/O-RAN system evolution.

GaN power amplifier (PA) technology has been linearized with the Xilinx digital pre-distortion (DPD). We have created powerful PA algorithms that run on the Xilinx FPGAs. The issue with GaN is long-term memory effect.

Although, the improvements in LDMOS amplifier characteristics allow for frequency ranges up to 22 GHz, GaN-based amplifiers achieve frequencies up to 30 GHz at power densities up to five times higher, although at higher prices than LDMOS devices.

As for the form factor/weight of O-RU design, Xilinx lidless yields 28-degrees C improvement over traditional lidded design. Chinese OEMs are now incorporating lidless technology in 5G mMIMO panels.

Advanced silicon integration
Advanced silicon integration is done on Xilinx Zynq RFSoC DFE (digital front-end). There is over 2X compute for powerful DPD and upto 400MHz bandwidth. There are hardened and configurable functions for power and cost. Programmable logic also enables customization and future adaptability.

There is increased flexible compute, while reducing TCO. Use cases will definitely evolve over time. Base stations can adapt with the Zynq RFSoC DFE. There can be hardened DFE cores in 5G phase 2. Rel 18 will have adaptable logic as the standards evolve.

You can also scale for capacity. There are studies conducted by Ericsson. In India, the 32TRX seems to have found a sweet spot. The 64TRX has found space in China and Korea.

For O-RAN and O-DU virtualization, the O-RAN defines the functional splits between O-DU (baseband) and O-RU (radio). We anticipate the further migration of functionality to O-RU. Equalization, channel estimation, etc., are associated.

There is the 5G mMIMO UL performance challenge. MIMO decoder channel correction performance is dependent on factors. There is the beamformer discrimination performance. There will be future UL performance solution. There will be improved UL beamformer performance. It is essential to have a system that is adaptable and programmable.

The Xilinx 7nm Versal ACAP platform is adaptable technology for next-gen 5G beamforming systems. The capacity, compute and performance is 5X than the 16nm device.

In conclusion, the first wave of 5G has provided a clear picture of success metrics and challenges for the next wave. Advanced technology is essential to realize the 5G vision of higher capacity and improved services in an economically viable manner. Next-gen Xilinx technology provides an adaptable platform.

Xilinx aligning to 5G trends

Posted on

The Xilinx Adapt:5G event started in the USA today. Liam Madden, EVP and GM, Wired and Wireless Group, Xilinx, rolled off proceedings with the session: Aligning to 5G Trends and Winning the Race.

Many 5G networks are becoming saturated, as of Sept. 2020. Korea saw 2X increase in data consumption, relative to 4G. We expect the lifetime of 5G to be much more than 4G. There are more 5G use cases evolving, with IIoT, Industry 4.0, CV2X (cellular-vehicle to everything), etc., leading the way.

Liam Madden.

Nearly half of the world is still not connected to the Internet. The 5G market disruptions are also enabling new operators and providers.

There are new operators, such as Rakuten, in Japan. It is also scalable from massive-MIMO macrocell to small cell. Xilinx has partnered with TI for SoC devices. Xilinx is an active member of the O-RAN Alliance and the Open RAN Policy Coalition.

Recently, Vodafone ran the OpenRAN hardware radio RF results. We are also selected by KMW, South Korea, for its future products. KMW was responsible for the 5G rollout in South Korea. Also, Samsung and Xilinx engineers used high-level compliers. For the Zynq from Xilinx, the RF SoC DFE, we got valuable feedback from customers. There is also the 5G Xilinx telco card, which is being evaluated.

Xilinx solutions include SDSoC and SDAccel Environments, Vivado HLS, Multi-level security, Zynq UltraScale+ MPSoC, and UltraScale. The 5G apps targeted are CloudRAN, massive MIMO, backhaul, fronthaul, baseband, and small cell.

5G big picture
Joe Madden, Founder and Chief Analyst, Mobile Experts Inc., presented on The 5G Big Picture: What It Is Really For, and How It Changes the World.

He said that it is difficult to use 10MB speed on cell phones. Things changed when faster clock speed didn’t drive user experience anymore. It’s really about the cost of computing today. The cost per GB of data has been decreasing per generation.

Joe Madden.

With 5G, we are getting much more. Money is spent on more and more GB. Wireless has been a little expensive than wired alternatives.

Now, cost difference does not really matter. With 5G, wireless is at the crossover point. At least 70-80 percent of traffic on the network is now video. People can stream information at a reasonable price.

Enterprise automation has the potential to grow to the trillion-dollar level. IoT, Industry 4.0, etc., will be among the drivers. There needs to be some maturity in the edge computing and software platforms.

In 5G network deployment, Korea, Japan, and USA, are driving solid growth based on real customer demand. China represents a surge in 5G in the short-term period. The peak is currently surging through. China’s MIT is delaying the 5G deployment.

In Covid-19, cloud saved the world, while mobile saved the cloud. There was 770 percent increase in cloud instances, and 23X increase in Webex sessions.
There was 10 percent to 50 percent in mobile data traffic. Wireless has picked up the slack. Mobile has accelerarted the attack on fixed broadband.

The O-RAN Alliance is now helping to get the Open RAN gain traction. Rakuten network in Japan has demonstrated how it works. We expect the high-capacity performance to improve. There is going to be some fragmentation with more radio vendors.

There is also the chaos of the trade war between China and USA. Huawei is cut off from American semiconductor shipments, and also from TSMC. SMIC is not a viable solution. Qualcomm has been asking for a waiver to ship SoCs to Huawei. The Chinese government has also delayed 5G build-outs, perhaps, for the next two years, to allow for negotiations at the political level. Overall, 2020 has been a good year, but China has not been surging, as expected. The economics of ASICs vs FPGAs are also changing.

5G enables ‘true cord cutting’. It is convenient, and is also about cutting the TV bundle. We will see different kinds of market growth in the future.

Can specialized AI chips solve AI product challenge?

Posted on

Xilinx organized a session titled: Can specialized AI chips solve AI product challenge?

Nick Ni, Director of Product Marketing, AI Software Ecosystem, Xilinx, said that there is projected growth of $30 billion approximately by 2023, as per Barclays Research Co. There is an era of domain-specific architecture. AI, deep learning, etc., are far from mature. There are DSAs all around, such as FPGA, Zynq, ACAP, etc. DSA in AI has custom data path and custom precision.

Today, AI apps are everywhere. They are in classification, object detection, segmentation, speech recognition, recommendation engine, anamoly detection, etc. AI is also evolving rapidly. There are the AlexNet, GoogLeNet, and DenseNet. The AI innovation is now pushing for new types of DSAs.

There are trends toward depth-wise that will need flexible and large device memory DSAs, with fixed and limited device memory as GPUs and typical AI chips. You can also have custom layers with dedicated hardware accelerators, which lead to host CPU. These are just two examples. You can use sparse neural network (NN) for visual search. The face search is generally dependent on latency. There is also the binarized neural network. DSA with 1b can fit order of magnitude more MACs than 8b.

There is the use of adaptable hardware with DSA. You have the world’s most responsive AI in MLPerf. Xilinx is just ahead of Intel in performance. Xilinx also achieved the peak TOPS in MLPerf. The power of DSA is immense. AI application can help detect things, etc. You have to take on hundreds of camera streams and decode. You have to take on the computer vision, as well, to decode. In terms of ADAS/AD, you have to take into action motor control. GPUs and typical AI chips fail to accelerate the full system. DSA is needed to power AI.

As an example, Pony.AI is using Xilinx AI with DSA for autonomous driving. Subaru is using it for ADAS. SK Telecom is using it for speech recognition. Ouster is using it for the LIDAR.

Xilinx is also enabling AI scientists. Vitis AI is a deep learning acceleration stack. There are 70,000+ downloads of Vitis and Vitis AI tools since launch in November 2019. Xilinx has had 100+ design wins. Vitis AI is also integrated with Microsoft ONNXRuntime for enabling AI inference. Xilinx has the Adaptive Computing Challenge going on with 1,000+ developers and start-ups registered.

AI productization requires the DSA to meet the production needs for performance, latency and cost. Only adaptable hardware can achieve 100 percent hardware utilization by building DSA for the exact workload, proven with MLPerf submission in v0.7. AI app is much more than AI-> DSA for non-AI acceleration. Now, an adaptable hardware solution enables the AI and software developers with Vitis, supporting TensorFlow, Pytorch, etc.

Disruptive innovation with Xilinx Versal ACAP super FPGA

Posted on Updated on

Xilinx Inc. recently announced Versal – the super FPGA. Versal is said to be the first adaptive compute acceleration platform (ACAP), a fully software-programmable, heterogeneous, compute platform.

Built on TSMC’s 7-nanometer FinFET process technology,, Versal ACAP combines scalar engines, adaptable engines, and intelligent engines to achieve dramatic performance improvements of up to 20X over today’s fastest FPGA implementations, and over 100X over today’s fastest CPU implementations—for data center, wired network, 5G wireless, and automotive driver-assist applications.

Versal ACAP
Versal is the first ACAP by Xilinx. What exactly is an ACAP? For which applications does it work best?

Victor Peng

Victor Peng, president and CEO, Xilinx, said: “An ACAP is a heterogeneous, hardware adaptable platform that is built from the ground up to be fully software programmable. An ACAP is fundamentally different from any multi-core architecture as it provides hardware programmability, but, the developer does not have to understand any of the hardware detail.

“From a software standpoint, it includes tools, libraries, run-time stacks and everything that you’d expect from a modern software-driven product. The tool chain, however, takes into account every type of developer—from the hardware developer, to embedded developer, to data scientist, and to framework developer.

Differences from classic FPGA and SoC
Now, that means there are technical differences in the Versal from a classic FPGA and to an SoC.

He said: “A Versal ACAP is significantly different than a regular FPGA or SoC. Zero hardware expertise is required to boot the device. Developers can connect to a host via CCIX or PCIe and get memory-mapped access to all peripherals (e.g., AI engines, DDR memory controllers).

“The Network-on-Chip is at the heart of what makes this possible. It provides ease-of-use, and makes the ACAP inherently SW programmable—available at boot and without any traditional FPGA place-and-route or bit stream. No programmable logic experience is required to get started, but designers can design their own IP or add from the large Xilinx ecosystem.

“With regard to Xilinx’s hardware programmable SoCs (Zynq-7000 and Zynq UltraScale+ SoCs), the Zynq platform partially integrated two out of the three engine types (Scalar Engines and Adaptable Hardware Engines).

“Versal devices add a third engine type (intelligent engines). More importantly, the ACAP architecture tightly couples them together, via the Network on Chip (NOC) to enable each engine type to deliver 2-3x the computational efficiency of a single engine architecture, such as a SIMT GPU.”

Does this mean that Xilinx will address, besides the classic hardware designers, the application engineers in the future?

He noted: “Xilinx has been addressing software developers with design abstraction tools as well as its hardware programmable SoC devices (Zynq-7000 and Zynq UltraScale+) for multiple generations. However, with ACAP, software programmability is inherently designed into the architecture itself for the entire platform, including its hardware adaptable engines and peripherals.”