PPoPP '17- Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Full Citation in the ACM Digital Library

SESSION: Keynote

It's Time for a New Old Language

SESSION: Session 1: GPU I

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU

Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

SESSION: Session 2: Concurrency

Checking Concurrent Data Structures Under the C/C++11 Memory Model

An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

Noise Injection Techniques to Expose Subtle and Unintended Message Races

SESSION: Session 3: Tools

Thread Data Sharing in Cache: Theory and Measurement

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications

Processor-Oblivious Record and Replay

SESSION: Session 4: GPU II

Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU Stencils

Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters

Model-based Iterative CT Image Reconstruction on GPUs

SESSION: Session 5: Best Papers

Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations

Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation

A Multicore Path to Connectomics-on-Demand

SESSION: Session 6: Languages & Compilers

SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap

Synchronized-by-Default Concurrency for Shared-Memory Systems

Function Call Re-Vectorization

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

SESSION: Session 7: Data Analytics

Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions

KiWi: A Key-Value Map for Scalable Real-Time Analytics

Grammar-aware Parallelization for Scalable XPath Querying

Eunomia: Scaling Concurrent Search Trees under Contention Using HTM

SESSION: Session 8: Fault Tolerance

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL

Silent Data Corruption Resilient Two-sided Matrix Factorizations

POSTER SESSION: Session 9: Posters

POSTER: Reuse, don't Recycle: Transforming Algorithms that Throw Away Descriptors

POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via Privatization

POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support

POSTER: Provably Efficient Scheduling of Cache-Oblivious Wavefront Algorithms

POSTER: State Teleportation via Hardware Transactional Memory

POSTER: IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases

POSTER: Distributed Control: The Benefits of Eliminating Global Synchronization via Effective Scheduling

POSTER: MAPA: An Automatic Memory Access Pattern Analyzer for GPU Applications

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures

POSTER: Automated Load Balancer Selection Based on Application Characteristics

POSTER: A GPU-Friendly Skiplist Algorithm

POSTER: Poor Man's URCU

POSTER: A Wait-Free Queue with Wait-Free Memory Reclamation

POSTER: STAR (Space-Time Adaptive and Reductive) Algorithms for Real-World Space-Time Optimality

POSTER: Recovering Performance for Vector-based Machine Learning on Managed Runtime

POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory Models

POSTER: An Infrastructure for HPC Knowledge Sharing and Reuse